In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within j...In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.展开更多
Stochastic differential equations(SDEs)are mathematical models that are widely used to describe complex processes or phenomena perturbed by random noise from different sources.The identification of SDEs governing a sy...Stochastic differential equations(SDEs)are mathematical models that are widely used to describe complex processes or phenomena perturbed by random noise from different sources.The identification of SDEs governing a system is often a challenge because of the inherent strong stochasticity of data and the complexity of the system’s dynamics.The practical utility of existing parametric approaches for identifying SDEs is usually limited by insufficient data resources.This study presents a novel framework for identifying SDEs by leveraging the sparse Bayesian learning(SBL)technique to search for a parsimonious,yet physically necessary representation from the space of candidate basis functions.More importantly,we use the analytical tractability of SBL to develop an efficient way to formulate the linear regression problem for the discovery of SDEs that requires considerably less time-series data.The effectiveness of the proposed framework is demonstrated using real data on stock and oil prices,bearing variation,and wind speed,as well as simulated data on well-known stochastic dynamical systems,including the generalized Wiener process and Langevin equation.This framework aims to assist specialists in extracting stochastic mathematical models from random phenomena in the natural sciences,economics,and engineering fields for analysis,prediction,and decision making.展开更多
Owing to the emergence of drug resistance and high morbidity,the need for novel antiviral drugs with novel targets is highly sought after.Marine-derived compounds mostly possess potent antiviral activity and serve as ...Owing to the emergence of drug resistance and high morbidity,the need for novel antiviral drugs with novel targets is highly sought after.Marine-derived compounds mostly possess potent antiviral activity and serve as a primary source for developing novel antiviral drugs,making the rapid discovery and evaluation of marine antiviral agents particularly crucial.Thus,future research should place greater emphasis on the identification of novel antiviral targets through the combination of artificial intelligence(AI)and structural pharmacology,as well as expanding the marine resource and target databases.展开更多
Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for ...Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically.展开更多
Tauopathies,diseases characterized by neuropathological aggregates of tau including Alzheimer's disease and subtypes of fro ntotemporal dementia,make up the vast majority of dementia cases.Although there have been...Tauopathies,diseases characterized by neuropathological aggregates of tau including Alzheimer's disease and subtypes of fro ntotemporal dementia,make up the vast majority of dementia cases.Although there have been recent developments in tauopathy biomarkers and disease-modifying treatments,ongoing progress is required to ensure these are effective,economical,and accessible for the globally ageing population.As such,continued identification of new potential drug targets and biomarkers is critical."Big data"studies,such as proteomics,can generate information on thousands of possible new targets for dementia diagnostics and therapeutics,but currently remain underutilized due to the lack of a clear process by which targets are selected for future drug development.In this review,we discuss current tauopathy biomarkers and therapeutics,and highlight areas in need of improvement,particularly when addressing the needs of frail,comorbid and cognitively impaired populations.We highlight biomarkers which have been developed from proteomic data,and outline possible future directions in this field.We propose new criteria by which potential targets in proteomics studies can be objectively ranked as favorable for drug development,and demonstrate its application to our group's recent tau interactome dataset as an example.展开更多
Mitigating vortex-induced vibrations(VIV)in flexible risers represents a critical concern in offshore oil and gas production,considering its potential impact on operational safety and efficiency.The accurate predictio...Mitigating vortex-induced vibrations(VIV)in flexible risers represents a critical concern in offshore oil and gas production,considering its potential impact on operational safety and efficiency.The accurate prediction of displacement and position of VIV in flexible risers remains challenging under actual marine conditions.This study presents a data-driven model for riser displacement prediction that corresponds to field conditions.Experimental data analysis reveals that the XGBoost algorithm predicts the maximum displacement and position with superior accuracy compared with Support vector regression(SVR),considering both computational efficiency and precision.Platform displacement in the Y-direction demonstrates a significant positive correlation with both axial depth and maximum displacement magnitude.The fourth point displacement exhibits the highest contribution to model prediction outcomes,showing a positive influence on maximum displacement while negatively affecting the axial depth of maximum displacement.Platform displacement in the X-and Y-directions exhibits competitive effects on both the riser’s maximum displacement and its axial depth.Through the implementation of XGBoost algorithm and SHapley Additive exPlanation(SHAP)analysis,the model effectively estimates the riser’s maximum displacement and its precise location.This data-driven approach achieves predictions using minimal,readily available data points,enhancing its practical field applications and demonstrating clear relevance to academic and professional communities.展开更多
Based on the educational evaluation reform,this study explores the construction of an evidence-based value-added evaluation system based on data-driven,aiming to solve the limitations of traditional evaluation methods...Based on the educational evaluation reform,this study explores the construction of an evidence-based value-added evaluation system based on data-driven,aiming to solve the limitations of traditional evaluation methods.The research adopts the method of combining theoretical analysis and practical application,and designs the evidence-based value-added evaluation framework,which includes the core elements of a multi-source heterogeneous data acquisition and processing system,a value-added evaluation agent based on a large model,and an evaluation implementation and application mechanism.Through empirical research verification,the evaluation system has remarkable effects in improving learning participation,promoting ability development,and supporting teaching decision-making,and provides a theoretical reference and practical path for educational evaluation reform in the new era.The research shows that the evidence-based value-added evaluation system based on data-driven can reflect students’actual progress more fairly and objectively by accurately measuring the difference in starting point and development range of students,and provide strong support for the realization of high-quality education development.展开更多
Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the...Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the innate capabilities of transformer architectures to comprehend intricate hierarchical dependencies inherent in sequential data,these models showcase remarkable efficacy across various tasks,including new drug design and drug target identification.The adaptability of pre-trained trans-former-based models renders them indispensable assets for driving data-centric advancements in drug discovery,chemistry,and biology,furnishing a robust framework that expedites innovation and dis-covery within these domains.Beyond their technical prowess,the success of transformer-based models in drug discovery,chemistry,and biology extends to their interdisciplinary potential,seamlessly combining biological,physical,chemical,and pharmacological insights to bridge gaps across diverse disciplines.This integrative approach not only enhances the depth and breadth of research endeavors but also fosters synergistic collaborations and exchange of ideas among disparate fields.In our review,we elucidate the myriad applications of transformers in drug discovery,as well as chemistry and biology,spanning from protein design and protein engineering,to molecular dynamics(MD),drug target iden-tification,transformer-enabled drug virtual screening(VS),drug lead optimization,drug addiction,small data set challenges,chemical and biological image analysis,chemical language understanding,and single cell data.Finally,we conclude the survey by deliberating on promising trends in transformer models within the context of drug discovery and other sciences.展开更多
We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpr...We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpreting these parameters is crucial for effectively exploring and developing oil and gas.However,with the increasing complexity of geological conditions in this industry,there is a growing demand for improved accuracy in reservoir parameter prediction,leading to higher costs associated with manual interpretation.The conventional logging interpretation methods rely on empirical relationships between logging data and reservoir parameters,which suffer from low interpretation efficiency,intense subjectivity,and suitability for ideal conditions.The application of artificial intelligence in the interpretation of logging data provides a new solution to the problems existing in traditional methods.It is expected to improve the accuracy and efficiency of the interpretation.If large and high-quality datasets exist,data-driven models can reveal relationships of arbitrary complexity.Nevertheless,constructing sufficiently large logging datasets with reliable labels remains challenging,making it difficult to apply data-driven models effectively in logging data interpretation.Furthermore,data-driven models often act as“black boxes”without explaining their predictions or ensuring compliance with primary physical constraints.This paper proposes a machine learning method with strong physical constraints by integrating mechanism and data-driven models.Prior knowledge of logging data interpretation is embedded into machine learning regarding network structure,loss function,and optimization algorithm.We employ the Physically Informed Auto-Encoder(PIAE)to predict porosity and water saturation,which can be trained without labeled reservoir parameters using self-supervised learning techniques.This approach effectively achieves automated interpretation and facilitates generalization across diverse datasets.展开更多
Against the backdrop of the national innovation strategy and the digital transformation of education,the traditional“extensive”training model for innovation and entrepreneurship talents struggles to meet the persona...Against the backdrop of the national innovation strategy and the digital transformation of education,the traditional“extensive”training model for innovation and entrepreneurship talents struggles to meet the personalized development needs of students,making an urgent shift toward precision and intelligence necessary.This study constructs a four-dimensional integrated framework centered on data,“Goal-Data-Intervention-Evaluation”,and proposes a data-driven training model for innovation and entrepreneurship talents in universities.By collecting multi-source data such as learning behaviors,competency assessments,and practical projects,the model conducts in-depth analysis of students’individual characteristics and development potential,enabling precise decision-making in goal setting,teaching intervention,and practical guidance.Based on data analysis,a supportive system for personalized teaching and practical activities is established.Combined with process-oriented and summative evaluations,a closed-loop feedback mechanism is formed to improve training effectiveness.This model provides a theoretical framework and practical path for the scientific,personalized,and intelligent development of innovation and entrepreneurship education in universities.展开更多
A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching...A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching their diversity of configuration while ensuring connectivity.To construct the data-driven model of substructure,a database is prepared by sampling the space of substructures spanned by several substructure prototypes.Then,for each substructure in this database,the stiffness matrix is condensed so that its degrees of freedomare reduced.Thereafter,the data-drivenmodel of substructures is constructed through interpolationwith compactly supported radial basis function(CS-RBF).The inputs of the data-driven model are the design variables of topology optimization,and the outputs are the condensed stiffness matrix and volume of substructures.During the optimization,this data-driven model is used,thus avoiding repeated static condensation that would requiremuch computation time.Several numerical examples are provided to verify the proposed method.展开更多
The outstanding comprehensive mechanical properties of newly developed hybrid lattice structures make them useful in engineering applications for bearing multiple mechanical loads.Additive-manufacturing technologies m...The outstanding comprehensive mechanical properties of newly developed hybrid lattice structures make them useful in engineering applications for bearing multiple mechanical loads.Additive-manufacturing technologies make it possible to fabricate these highly spatially programmable structures and greatly enhance the freedom in their design.However,traditional analytical methods do not sufficiently reflect the actual vibration-damping mechanism of lattice structures and are limited by their high computational cost.In this study,a hybrid lattice structure consisting of various cells was designed based on quasi-static and vibration experiments.Subsequently,a novel parametric design method based on a data-driven approach was developed for hybrid lattices with engineered properties.The response surface method was adopted to define the sensitive optimization target.A prediction model for the lattice geometric parameters and vibration properties was established using a backpropagation neural network.Then,it was integrated into the genetic algorithm to create the optimal hybrid lattice with varying geometric features and the required wide-band vibration-damping characteristics.Validation experiments were conducted,demonstrating that the optimized hybrid lattice can achieve the target properties.In addition,the data-driven parametric design method can reduce computation time and be widely applied to complex structural designs when analytical and empirical solutions are unavailable.展开更多
AI-driven materials databases are transforming research by integrating experimental and computational data to enhance discovery and optimization.Platforms such as Digital Catalysis Platform(DigCat)and Dynamic Database...AI-driven materials databases are transforming research by integrating experimental and computational data to enhance discovery and optimization.Platforms such as Digital Catalysis Platform(DigCat)and Dynamic Database of Solid-State Electrolyte(DDSE)demonstrate how machine learning and predictive modeling can improve catalyst and solid-state electrolyte development.These databases facilitate data standardization,high-throughput screening,and cross-disciplinary collaboration,addressing key challenges in materials informatics.As AI techniques advance,materials databases are expected to play an increasingly vital role in accelerating research and innovation.展开更多
The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communica...The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communications using a finite number of pilots.On the other hand,Deep Learning(DL)approaches have been very successful in wireless OFDM communications.However,whether they will work underwater is still a mystery.For the first time,this paper compares two categories of DL-based UWA OFDM receivers:the DataDriven(DD)method,which performs as an end-to-end black box,and the Model-Driven(MD)method,also known as the model-based data-driven method,which combines DL and expert OFDM receiver knowledge.The encoder-decoder framework and Convolutional Neural Network(CNN)structure are employed to establish the DD receiver.On the other hand,an unfolding-based Minimum Mean Square Error(MMSE)structure is adopted for the MD receiver.We analyze the characteristics of different receivers by Monte Carlo simulations under diverse communications conditions and propose a strategy for selecting a proper receiver under different communication scenarios.Field trials in the pool and sea are also conducted to verify the feasibility and advantages of the DL receivers.It is observed that DL receivers perform better than conventional receivers in terms of bit error rate.展开更多
For control systems with unknown model parameters,this paper proposes a data-driven iterative learning method for fault estimation.First,input and output data from the system under fault-free conditions are collected....For control systems with unknown model parameters,this paper proposes a data-driven iterative learning method for fault estimation.First,input and output data from the system under fault-free conditions are collected.By applying orthogonal triangular decomposition and singular value decomposition,a data-driven realization of the system's kernel representation is derived,based on this representation,a residual generator is constructed.Then,the actuator fault signal is estimated online by analyzing the system's dynamic residual,and an iterative learning algorithm is introduced to continuously optimize the residual-based performance function,thereby enhancing estimation accuracy.The proposed method achieves actuator fault estimation without requiring knowledge of model parameters,eliminating the time-consuming system modeling process,and allowing operators to focus on system optimization and decision-making.Compared with existing fault estimation methods,the proposed method demonstrates superior transient performance,steady-state performance,and real-time capability,reduces the need for manual intervention and lowers operational complexity.Finally,experimental results on a mobile robot verify the effectiveness and advantages of the method.展开更多
When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding bia...When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.展开更多
In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD...In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD),with its emphasis on cross-functional teamwork,concurrent engineering,and data-driven decision-making,has been widely recognized for enhancing R&D efficiency and product quality.However,the unique characteristics of SOEs pose challenges to the effective implementation of IPD.The advancement of big data and artificial intelligence technologies offers new opportunities for optimizing IPD R&D management through data-driven decision-making models.This paper constructs and validates a data-driven decision-making model tailored to the IPD R&D management of SOEs.By integrating data mining,machine learning,and other advanced analytical techniques,the model serves as a scientific and efficient decision-making tool.It aids SOEs in optimizing R&D resource allocation,shortening product development cycles,reducing R&D costs,and improving product quality and innovation.Moreover,this study contributes to a deeper theoretical understanding of the value of data-driven decision-making in the context of IPD.展开更多
Hydraulic fracturing technology has achieved remarkable results in improving the production of tight gas reservoirs,but its effectiveness is under the joint action of multiple factors of complexity.Traditional analysi...Hydraulic fracturing technology has achieved remarkable results in improving the production of tight gas reservoirs,but its effectiveness is under the joint action of multiple factors of complexity.Traditional analysis methods have limitations in dealing with these complex and interrelated factors,and it is difficult to fully reveal the actual contribution of each factor to the production.Machine learning-based methods explore the complex mapping relationships between large amounts of data to provide datadriven insights into the key factors driving production.In this study,a data-driven PCA-RF-VIM(Principal Component Analysis-Random Forest-Variable Importance Measures)approach of analyzing the importance of features is proposed to identify the key factors driving post-fracturing production.Four types of parameters,including log parameters,geological and reservoir physical parameters,hydraulic fracturing design parameters,and reservoir stimulation parameters,were inputted into the PCA-RF-VIM model.The model was trained using 6-fold cross-validation and grid search,and the relative importance ranking of each factor was finally obtained.In order to verify the validity of the PCA-RF-VIM model,a consolidation model that uses three other independent data-driven methods(Pearson correlation coefficient,RF feature significance analysis method,and XGboost feature significance analysis method)are applied to compare with the PCA-RF-VIM model.A comparison the two models shows that they contain almost the same parameters in the top ten,with only minor differences in one parameter.In combination with the reservoir characteristics,the reasonableness of the PCA-RF-VIM model is verified,and the importance ranking of the parameters by this method is more consistent with the reservoir characteristics of the study area.Ultimately,the ten parameters are selected as the controlling factors that have the potential to influence post-fracturing gas production,as the combined importance of these top ten parameters is 91.95%on driving natural gas production.Analyzing and obtaining these ten controlling factors provides engineers with a new insight into the reservoir selection for fracturing stimulation and fracturing parameter optimization to improve fracturing efficiency and productivity.展开更多
文摘In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.
基金supported by the National Key Research and Development Program of China(2018YFB1701202)the National Natural Science Foundation of China(92167201 and 51975237)the Fundamental Research Funds for the Central Universities,Huazhong University of Science and Technology(2021JYCXJJ028)。
文摘Stochastic differential equations(SDEs)are mathematical models that are widely used to describe complex processes or phenomena perturbed by random noise from different sources.The identification of SDEs governing a system is often a challenge because of the inherent strong stochasticity of data and the complexity of the system’s dynamics.The practical utility of existing parametric approaches for identifying SDEs is usually limited by insufficient data resources.This study presents a novel framework for identifying SDEs by leveraging the sparse Bayesian learning(SBL)technique to search for a parsimonious,yet physically necessary representation from the space of candidate basis functions.More importantly,we use the analytical tractability of SBL to develop an efficient way to formulate the linear regression problem for the discovery of SDEs that requires considerably less time-series data.The effectiveness of the proposed framework is demonstrated using real data on stock and oil prices,bearing variation,and wind speed,as well as simulated data on well-known stochastic dynamical systems,including the generalized Wiener process and Langevin equation.This framework aims to assist specialists in extracting stochastic mathematical models from random phenomena in the natural sciences,economics,and engineering fields for analysis,prediction,and decision making.
文摘Owing to the emergence of drug resistance and high morbidity,the need for novel antiviral drugs with novel targets is highly sought after.Marine-derived compounds mostly possess potent antiviral activity and serve as a primary source for developing novel antiviral drugs,making the rapid discovery and evaluation of marine antiviral agents particularly crucial.Thus,future research should place greater emphasis on the identification of novel antiviral targets through the combination of artificial intelligence(AI)and structural pharmacology,as well as expanding the marine resource and target databases.
基金supported by NIH grants R01LM010817 and P01AG039347
文摘Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically.
基金supported by funding from the Bluesand Foundation,Alzheimer's Association(AARG-21-852072 and Bias Frangione Early Career Achievement Award)to EDan Australian Government Research Training Program scholarship and the University of Sydney's Brain and Mind Centre fellowship to AH。
文摘Tauopathies,diseases characterized by neuropathological aggregates of tau including Alzheimer's disease and subtypes of fro ntotemporal dementia,make up the vast majority of dementia cases.Although there have been recent developments in tauopathy biomarkers and disease-modifying treatments,ongoing progress is required to ensure these are effective,economical,and accessible for the globally ageing population.As such,continued identification of new potential drug targets and biomarkers is critical."Big data"studies,such as proteomics,can generate information on thousands of possible new targets for dementia diagnostics and therapeutics,but currently remain underutilized due to the lack of a clear process by which targets are selected for future drug development.In this review,we discuss current tauopathy biomarkers and therapeutics,and highlight areas in need of improvement,particularly when addressing the needs of frail,comorbid and cognitively impaired populations.We highlight biomarkers which have been developed from proteomic data,and outline possible future directions in this field.We propose new criteria by which potential targets in proteomics studies can be objectively ranked as favorable for drug development,and demonstrate its application to our group's recent tau interactome dataset as an example.
基金The research work was financially supported by the National Natural Science Foundation of China(Grant Nos.51979238 and 52301338)the Sichuan Science and Technology Program(Grant Nos.2023NSFSC1953 and 2023ZYD0140).
文摘Mitigating vortex-induced vibrations(VIV)in flexible risers represents a critical concern in offshore oil and gas production,considering its potential impact on operational safety and efficiency.The accurate prediction of displacement and position of VIV in flexible risers remains challenging under actual marine conditions.This study presents a data-driven model for riser displacement prediction that corresponds to field conditions.Experimental data analysis reveals that the XGBoost algorithm predicts the maximum displacement and position with superior accuracy compared with Support vector regression(SVR),considering both computational efficiency and precision.Platform displacement in the Y-direction demonstrates a significant positive correlation with both axial depth and maximum displacement magnitude.The fourth point displacement exhibits the highest contribution to model prediction outcomes,showing a positive influence on maximum displacement while negatively affecting the axial depth of maximum displacement.Platform displacement in the X-and Y-directions exhibits competitive effects on both the riser’s maximum displacement and its axial depth.Through the implementation of XGBoost algorithm and SHapley Additive exPlanation(SHAP)analysis,the model effectively estimates the riser’s maximum displacement and its precise location.This data-driven approach achieves predictions using minimal,readily available data points,enhancing its practical field applications and demonstrating clear relevance to academic and professional communities.
基金This paper is the research result of“Research on Innovation of Evidence-Based Teaching Paradigm in Vocational Education under the Background of New Quality Productivity”(2024JXQ176)the Shandong Province Artificial Intelligence Education Research Project(SDDJ202501035),which explores the application of artificial intelligence big models in student value-added evaluation from an evidence-based perspective。
文摘Based on the educational evaluation reform,this study explores the construction of an evidence-based value-added evaluation system based on data-driven,aiming to solve the limitations of traditional evaluation methods.The research adopts the method of combining theoretical analysis and practical application,and designs the evidence-based value-added evaluation framework,which includes the core elements of a multi-source heterogeneous data acquisition and processing system,a value-added evaluation agent based on a large model,and an evaluation implementation and application mechanism.Through empirical research verification,the evaluation system has remarkable effects in improving learning participation,promoting ability development,and supporting teaching decision-making,and provides a theoretical reference and practical path for educational evaluation reform in the new era.The research shows that the evidence-based value-added evaluation system based on data-driven can reflect students’actual progress more fairly and objectively by accurately measuring the difference in starting point and development range of students,and provide strong support for the realization of high-quality education development.
基金supported in part by National Institute of Health(NIH),USA(Grant Nos.:R01GM126189,R01AI164266,and R35GM148196)the National Science Foundation,USA(Grant Nos.DMS2052983,DMS-1761320,and IIS-1900473)+3 种基金National Aero-nautics and Space Administration(NASA),USA(Grant No.:80NSSC21M0023)Michigan State University(MSU)Foundation,USA,Bristol-Myers Squibb(Grant No.:65109)USA,and Pfizer,USAsupported by the National Natural Science Foundation of China(Grant Nos.:11971367,12271416,and 11972266).
文摘Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the innate capabilities of transformer architectures to comprehend intricate hierarchical dependencies inherent in sequential data,these models showcase remarkable efficacy across various tasks,including new drug design and drug target identification.The adaptability of pre-trained trans-former-based models renders them indispensable assets for driving data-centric advancements in drug discovery,chemistry,and biology,furnishing a robust framework that expedites innovation and dis-covery within these domains.Beyond their technical prowess,the success of transformer-based models in drug discovery,chemistry,and biology extends to their interdisciplinary potential,seamlessly combining biological,physical,chemical,and pharmacological insights to bridge gaps across diverse disciplines.This integrative approach not only enhances the depth and breadth of research endeavors but also fosters synergistic collaborations and exchange of ideas among disparate fields.In our review,we elucidate the myriad applications of transformers in drug discovery,as well as chemistry and biology,spanning from protein design and protein engineering,to molecular dynamics(MD),drug target iden-tification,transformer-enabled drug virtual screening(VS),drug lead optimization,drug addiction,small data set challenges,chemical and biological image analysis,chemical language understanding,and single cell data.Finally,we conclude the survey by deliberating on promising trends in transformer models within the context of drug discovery and other sciences.
基金supported by National Key Research and Development Program (2019YFA0708301)National Natural Science Foundation of China (51974337)+2 种基金the Strategic Cooperation Projects of CNPC and CUPB (ZLZX2020-03)Science and Technology Innovation Fund of CNPC (2021DQ02-0403)Open Fund of Petroleum Exploration and Development Research Institute of CNPC (2022-KFKT-09)
文摘We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpreting these parameters is crucial for effectively exploring and developing oil and gas.However,with the increasing complexity of geological conditions in this industry,there is a growing demand for improved accuracy in reservoir parameter prediction,leading to higher costs associated with manual interpretation.The conventional logging interpretation methods rely on empirical relationships between logging data and reservoir parameters,which suffer from low interpretation efficiency,intense subjectivity,and suitability for ideal conditions.The application of artificial intelligence in the interpretation of logging data provides a new solution to the problems existing in traditional methods.It is expected to improve the accuracy and efficiency of the interpretation.If large and high-quality datasets exist,data-driven models can reveal relationships of arbitrary complexity.Nevertheless,constructing sufficiently large logging datasets with reliable labels remains challenging,making it difficult to apply data-driven models effectively in logging data interpretation.Furthermore,data-driven models often act as“black boxes”without explaining their predictions or ensuring compliance with primary physical constraints.This paper proposes a machine learning method with strong physical constraints by integrating mechanism and data-driven models.Prior knowledge of logging data interpretation is embedded into machine learning regarding network structure,loss function,and optimization algorithm.We employ the Physically Informed Auto-Encoder(PIAE)to predict porosity and water saturation,which can be trained without labeled reservoir parameters using self-supervised learning techniques.This approach effectively achieves automated interpretation and facilitates generalization across diverse datasets.
基金Special Fund for Teacher Development Research Program of University of Shanghai for Science and Technology(Project No.:CFTD2025YB28)。
文摘Against the backdrop of the national innovation strategy and the digital transformation of education,the traditional“extensive”training model for innovation and entrepreneurship talents struggles to meet the personalized development needs of students,making an urgent shift toward precision and intelligence necessary.This study constructs a four-dimensional integrated framework centered on data,“Goal-Data-Intervention-Evaluation”,and proposes a data-driven training model for innovation and entrepreneurship talents in universities.By collecting multi-source data such as learning behaviors,competency assessments,and practical projects,the model conducts in-depth analysis of students’individual characteristics and development potential,enabling precise decision-making in goal setting,teaching intervention,and practical guidance.Based on data analysis,a supportive system for personalized teaching and practical activities is established.Combined with process-oriented and summative evaluations,a closed-loop feedback mechanism is formed to improve training effectiveness.This model provides a theoretical framework and practical path for the scientific,personalized,and intelligent development of innovation and entrepreneurship education in universities.
基金supported by the National Natural Science Foundation of China(Grant No.12272144).
文摘A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching their diversity of configuration while ensuring connectivity.To construct the data-driven model of substructure,a database is prepared by sampling the space of substructures spanned by several substructure prototypes.Then,for each substructure in this database,the stiffness matrix is condensed so that its degrees of freedomare reduced.Thereafter,the data-drivenmodel of substructures is constructed through interpolationwith compactly supported radial basis function(CS-RBF).The inputs of the data-driven model are the design variables of topology optimization,and the outputs are the condensed stiffness matrix and volume of substructures.During the optimization,this data-driven model is used,thus avoiding repeated static condensation that would requiremuch computation time.Several numerical examples are provided to verify the proposed method.
基金supported by National Natural Science Foundation of China(Grant No.52375380)National Key R&D Program of China(Grant No.2022YFB3402200)the Key Project of National Natural Science Foundation of China(Grant No.12032018).
文摘The outstanding comprehensive mechanical properties of newly developed hybrid lattice structures make them useful in engineering applications for bearing multiple mechanical loads.Additive-manufacturing technologies make it possible to fabricate these highly spatially programmable structures and greatly enhance the freedom in their design.However,traditional analytical methods do not sufficiently reflect the actual vibration-damping mechanism of lattice structures and are limited by their high computational cost.In this study,a hybrid lattice structure consisting of various cells was designed based on quasi-static and vibration experiments.Subsequently,a novel parametric design method based on a data-driven approach was developed for hybrid lattices with engineered properties.The response surface method was adopted to define the sensitive optimization target.A prediction model for the lattice geometric parameters and vibration properties was established using a backpropagation neural network.Then,it was integrated into the genetic algorithm to create the optimal hybrid lattice with varying geometric features and the required wide-band vibration-damping characteristics.Validation experiments were conducted,demonstrating that the optimized hybrid lattice can achieve the target properties.In addition,the data-driven parametric design method can reduce computation time and be widely applied to complex structural designs when analytical and empirical solutions are unavailable.
文摘AI-driven materials databases are transforming research by integrating experimental and computational data to enhance discovery and optimization.Platforms such as Digital Catalysis Platform(DigCat)and Dynamic Database of Solid-State Electrolyte(DDSE)demonstrate how machine learning and predictive modeling can improve catalyst and solid-state electrolyte development.These databases facilitate data standardization,high-throughput screening,and cross-disciplinary collaboration,addressing key challenges in materials informatics.As AI techniques advance,materials databases are expected to play an increasingly vital role in accelerating research and innovation.
基金funded in part by the National Natural Science Foundation of China under Grant 62401167 and 62192712in part by the Key Laboratory of Marine Environmental Survey Technology and Application,Ministry of Natural Resources,P.R.China under Grant MESTA-2023-B001in part by the Stable Supporting Fund of National Key Laboratory of Underwater Acoustic Technology under Grant JCKYS2022604SSJS007.
文摘The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communications using a finite number of pilots.On the other hand,Deep Learning(DL)approaches have been very successful in wireless OFDM communications.However,whether they will work underwater is still a mystery.For the first time,this paper compares two categories of DL-based UWA OFDM receivers:the DataDriven(DD)method,which performs as an end-to-end black box,and the Model-Driven(MD)method,also known as the model-based data-driven method,which combines DL and expert OFDM receiver knowledge.The encoder-decoder framework and Convolutional Neural Network(CNN)structure are employed to establish the DD receiver.On the other hand,an unfolding-based Minimum Mean Square Error(MMSE)structure is adopted for the MD receiver.We analyze the characteristics of different receivers by Monte Carlo simulations under diverse communications conditions and propose a strategy for selecting a proper receiver under different communication scenarios.Field trials in the pool and sea are also conducted to verify the feasibility and advantages of the DL receivers.It is observed that DL receivers perform better than conventional receivers in terms of bit error rate.
基金Supported by Shandong Provincial Taishan Scholar Program(Grant No.tsqn202312133)Shandong Provincial Natural Science Foundation(Grant Nos.ZR2022YQ61,ZR2023ZD32)+1 种基金Shandong Provincial Natural Science Foundation(Grant No.ZR2023ZD32)National Natural Science Foundation of China(Grant Nos.61772551 and 62111530052)。
文摘For control systems with unknown model parameters,this paper proposes a data-driven iterative learning method for fault estimation.First,input and output data from the system under fault-free conditions are collected.By applying orthogonal triangular decomposition and singular value decomposition,a data-driven realization of the system's kernel representation is derived,based on this representation,a residual generator is constructed.Then,the actuator fault signal is estimated online by analyzing the system's dynamic residual,and an iterative learning algorithm is introduced to continuously optimize the residual-based performance function,thereby enhancing estimation accuracy.The proposed method achieves actuator fault estimation without requiring knowledge of model parameters,eliminating the time-consuming system modeling process,and allowing operators to focus on system optimization and decision-making.Compared with existing fault estimation methods,the proposed method demonstrates superior transient performance,steady-state performance,and real-time capability,reduces the need for manual intervention and lowers operational complexity.Finally,experimental results on a mobile robot verify the effectiveness and advantages of the method.
文摘When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.
文摘In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD),with its emphasis on cross-functional teamwork,concurrent engineering,and data-driven decision-making,has been widely recognized for enhancing R&D efficiency and product quality.However,the unique characteristics of SOEs pose challenges to the effective implementation of IPD.The advancement of big data and artificial intelligence technologies offers new opportunities for optimizing IPD R&D management through data-driven decision-making models.This paper constructs and validates a data-driven decision-making model tailored to the IPD R&D management of SOEs.By integrating data mining,machine learning,and other advanced analytical techniques,the model serves as a scientific and efficient decision-making tool.It aids SOEs in optimizing R&D resource allocation,shortening product development cycles,reducing R&D costs,and improving product quality and innovation.Moreover,this study contributes to a deeper theoretical understanding of the value of data-driven decision-making in the context of IPD.
基金funded by the Key Research and Development Program of Shaanxi,China(No.2024GX-YBXM-503)the National Natural Science Foundation of China(No.51974254)。
文摘Hydraulic fracturing technology has achieved remarkable results in improving the production of tight gas reservoirs,but its effectiveness is under the joint action of multiple factors of complexity.Traditional analysis methods have limitations in dealing with these complex and interrelated factors,and it is difficult to fully reveal the actual contribution of each factor to the production.Machine learning-based methods explore the complex mapping relationships between large amounts of data to provide datadriven insights into the key factors driving production.In this study,a data-driven PCA-RF-VIM(Principal Component Analysis-Random Forest-Variable Importance Measures)approach of analyzing the importance of features is proposed to identify the key factors driving post-fracturing production.Four types of parameters,including log parameters,geological and reservoir physical parameters,hydraulic fracturing design parameters,and reservoir stimulation parameters,were inputted into the PCA-RF-VIM model.The model was trained using 6-fold cross-validation and grid search,and the relative importance ranking of each factor was finally obtained.In order to verify the validity of the PCA-RF-VIM model,a consolidation model that uses three other independent data-driven methods(Pearson correlation coefficient,RF feature significance analysis method,and XGboost feature significance analysis method)are applied to compare with the PCA-RF-VIM model.A comparison the two models shows that they contain almost the same parameters in the top ten,with only minor differences in one parameter.In combination with the reservoir characteristics,the reasonableness of the PCA-RF-VIM model is verified,and the importance ranking of the parameters by this method is more consistent with the reservoir characteristics of the study area.Ultimately,the ten parameters are selected as the controlling factors that have the potential to influence post-fracturing gas production,as the combined importance of these top ten parameters is 91.95%on driving natural gas production.Analyzing and obtaining these ten controlling factors provides engineers with a new insight into the reservoir selection for fracturing stimulation and fracturing parameter optimization to improve fracturing efficiency and productivity.