Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frame...Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.展开更多
Aim The purpose of this study was to develop a mathe-matical model to quantitatively describe the passive trans-port of macromolecules within dental biofilms. Methodology Fluorescently labeled dextrans with different ...Aim The purpose of this study was to develop a mathe-matical model to quantitatively describe the passive trans-port of macromolecules within dental biofilms. Methodology Fluorescently labeled dextrans with different molecular mass (3 kD,10 kD,40 kD,70 kD,2 000 kD) were used as a series of diffusion probes. Streptococcus mutans,Streptococcus sanguinis,Actinomyces naeslundii and Fusobacterium nucleatum were used as inocula for biofilm formation. The diffusion processes of different probes through the in vitro biofilm were recorded with a confocal laser microscope. Results Mathematical function of biofilm penetration was constructed on the basis of the inverse problem method. Based on this function,not only the relationship between average concentration of steady-state and molecule weights can be analyzed,but also that between penetrative time and molecule weights. Conclusion This can be used to predict the effective concentration and the penetrative time of anti-biofilm medicines that can diffuse through oral biofilm. Further-more,an improved model for large molecule is proposed by considering the exchange time at the upper boundary of the dental biofilm.展开更多
In this study,the removal of monovalent and divalent cations,Nat,Kt,Mg2t,and Ca2t,in a diluted solution from Chott-El Jerid Lake,Tunisia,was investigated with the electrodialysis technique.The process was tested using...In this study,the removal of monovalent and divalent cations,Nat,Kt,Mg2t,and Ca2t,in a diluted solution from Chott-El Jerid Lake,Tunisia,was investigated with the electrodialysis technique.The process was tested using two cation-exchange membranes:sulfonated polyether sulfone cross-linked with 10%hexamethylenediamine(HEXCl)and sulfonated polyether sulfone grafted with octylamine(S-PESOS).The commercially available membrane Nafion®was used for comparison.The results showed that Nafion®and S-PESOS membranes had similar removal behaviors,and the investigated cations were ranked in the following descending order in terms of their demineralization rates:Nat>Ca2t>Mg2t>Kt.Divalent cations were more effectively removed by HEXCl than by monovalent cations.The plots based on the WebereMorris model showed a strong linearity.This reveals that intra-particle diffusion was not the removal rate-determining step,and the removal process was controlled by two or more concurrent mechanisms.The Boyd plots did not pass through their origin,and the sole controlling step was determined by film-diffusion resistance,especially after a long period of electrodialysis.Additionally,a semi-empirical model was established to simulate the temporal variation of the treatment process,and the physical significance and values of model parameters were compared for the three membranes.The findings of this study indicate that HEXCl and S-PESOS membranes can be efficiently utilized for water softening,especially when effluents are highly loaded with calcium and magnesium ions.展开更多
Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has prov...Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.展开更多
Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited ...Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited research on molecular sequence diffusion models.The International Union of Pure and Applied Chemistry(IUPAC)names are more akin to chemical natural language than the simplified molecular input line entry system(SMILES)for organic compounds.In this work,we apply an IUPAC-guided conditional diffusion model to facilitate molecular editing from chemical natural language to chemical language(SMILES)and explore whether the pre-trained generative performance of diffusion models can be transferred to chemical natural language.We propose DiffIUPAC,a controllable molecular editing diffusion model that converts IUPAC names to SMILES strings.Evaluation results demonstrate that our model out-performs existing methods and successfully captures the semantic rules of both chemical languages.Chemical space and scaffold analysis show that the model can generate similar compounds with diverse scaffolds within the specified constraints.Additionally,to illustrate the model’s applicability in drug design,we conducted case studies in functional group editing,analogue design and linker design.展开更多
Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse de...Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.展开更多
The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these...The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these structures.Scientific imagery combined with artistic expression can further expand the potential of imaging in educational dissemination and interdisciplinary applications.展开更多
Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate mode...Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate model to predict the heat transfer during the casting process instead of numerical simulation.The conditional diffusion model was established and trained with the geometry shapes,initial temperature fields and temperature fields at t_(i) as the condition and random noise sampled from standard normal distribution as the input.The output was the temperature field at t_(i+1).Therefore,the temperature field at t_(i+1)can be predicted as the temperature field at t_(i) is known,and the continuous temperature fields of all the time steps can be predicted based on the initial temperature field of an arbitrary 2D geometry.A training set with 3022D shapes and their simulated temperature fields at different time steps was established.The accuracy for the temperature field for a single time step reaches 97.7%,and that for continuous time steps reaches 69.1%with the main error actually existing in the sand mold.The effect of geometry shape and initial temperature field on the prediction accuracy was investigated,the former achieves better result than the latter because the former can identify casting,mold and chill by different colors in the input images.The diffusion model has proved the potential as a surrogate model for numerical simulation of the casting process.展开更多
Multi-scale problems in Computational Fluid Dynamics(CFD)often require numerous simulations across various design parameters.Using a fixed mesh for all cases may fail to capture critical physical features.Moving mesh ...Multi-scale problems in Computational Fluid Dynamics(CFD)often require numerous simulations across various design parameters.Using a fixed mesh for all cases may fail to capture critical physical features.Moving mesh adaptation provides an optimal resource allocation to obtain high-resolution flow-fields on low-resolution meshes.However,most existing methods require manual experience and the flow posteriori information poses great challenges to practical applications.In addition,generating adaptive meshes directly from design parameters is difficult due to highly nonlinear relationships.The diffusion model is currently the most popular model in generative tasks that integrates the diffusion principle into deep learning to capture the complex nonlinear correlations.A dual diffusion framework,Para2Mesh,is proposed to predict the adaptive meshes from design parameters by exploiting the robust data distribution learning ability of the diffusion model.Through iterative denoising,the proposed dual networks accurately reconstruct the flow-field to provide flow features as supervised information,and then achieve rapid and reliable mesh movement.Experiments in CFD scenarios demonstrate that Para2Mesh predicts similar meshes directly from design parameters with much higher efficiency than traditional method.It could become a real-time adaptation tool to assist engineering design and optimization,providing a promising solution for high-resolution flow-field analysis.展开更多
Finding suitable initial noise that retains the original image’s information is crucial for image-to-image(I2I)translation using text-to-image(T2I)diffusion models.A common approach is to add random noise directly to...Finding suitable initial noise that retains the original image’s information is crucial for image-to-image(I2I)translation using text-to-image(T2I)diffusion models.A common approach is to add random noise directly to the original image,as in SDEdit.However,we have observed that this can result in“semantic discrepancy”issues,wherein T2I diffusion models misinterpret the semantic relationships and generate content not present in the original image.We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image,leading to unintended associations between unrelated regions after U-Net upsampling.Building on the widely-used latent diffusion model,Stable Diffusion,we propose a training-free,plugand-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image.By leveraging the deterministic nature of denoising diffusion implicit models(DDIMs)inversion,we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion.This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors,achieving a speedup of 11.2 times in experiments conducted on COCO,ImageNet,and ImageNet-R datasets across multiple I2I translation tasks.展开更多
Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(...Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.展开更多
With the growing demand for high-precision flow field simulations in computational science and engineering,the super-resolution reconstruction of physical fields has attracted considerable research interest.However,tr...With the growing demand for high-precision flow field simulations in computational science and engineering,the super-resolution reconstruction of physical fields has attracted considerable research interest.However,tradi-tional numerical methods often entail high computational costs,involve complex data processing,and struggle to capture fine-scale high-frequency details.To address these challenges,we propose an innovative super-resolution reconstruction framework that integrates a Fourier neural operator(FNO)with an enhanced diffusion model.The framework employs an adaptively weighted FNO to process low-resolution flow field inputs,effectively capturing global dependencies and high-frequency features.Furthermore,a residual-guided diffusion model is introduced to further improve reconstruction performance.This model uses a Markov chain for noise injection in phys-ical fields and integrates a reverse denoising procedure,efficiently solved by an adaptive time-step ordinary differential equation solver,thereby ensuring both stability and computational efficiency.Experimental results demonstrate that the proposed framework significantly outperforms existing methods in terms of accuracy and efficiency,offering a promising solution for fine-grained data reconstruction in scientific simulations.展开更多
Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numer...Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numerical simulation,which are costly and time-consuming.Therefore,it is necessary to obtain unsteady hydrodynamic performance in a low-cost and high-precision manner.Due to the strong nonlinearity,complex data distribution,and temporal characteristics of unsteady hydrodynamic performance,the prediction of it is challenging.This paper proposes a Temporal Convolutional Diffusion Model(TCDM)for predicting the unsteady hydrodynamic performance of seaplanes given design parameters.Under the framework of a classifier-free guided diffusion model,TCDM learns the distribution patterns of unsteady hydrodynamic performance data with the designed denoising module based on temporal convolutional network and captures the temporal features of unsteady hydrodynamic performance data.Using CFD simulation data,the proposed method is compared with the alternative methods to demonstrate its accuracy and generalization.This paper provides a method that enables the rapid and accurate prediction of unsteady hydrodynamic performance data,expecting to shorten the design cycle of seaplanes.展开更多
Multi-instance image generation remains a challenging task in the field of computer vision.While existing diffusionmodels demonstrate impressive fidelity in image generation,they often struggle with precisely controll...Multi-instance image generation remains a challenging task in the field of computer vision.While existing diffusionmodels demonstrate impressive fidelity in image generation,they often struggle with precisely controlling each object’s shape,pose,and size.Methods like layout-to-image and mask-to-image provide spatial guidance but frequently suffer from object shape distortion,overlaps,and poor consistency,particularly in complex scenes with multiple objects.To address these issues,we introduce PolyDiffusion,a contour-based diffusion framework that encodes each object’s contour as a boundary-coordinate sequence,decoupling object shapes and positions.This approach allows for better control over object geometry and spatial positioning,which is critical for achieving high-quality multiinstance generation.We formulate the training process as a multi-objective optimization problem,balancing three key objectives:a denoising diffusion loss to maintain overall image fidelity,a cross-attention contour alignment loss to ensure precise shape adherence,and a reward-guided denoising objective that minimizes the Fréchet distance to real images.In addition,the Object Space-Aware Attention module fuses contour tokens with visual features,while a prior-guided fusion mechanism utilizes inter-object spatial relationships and class semantics to enhance consistency across multiple objects.Experimental results on benchmark datasets such as COCO-Stuff and VOC-2012 demonstrate that PolyDiffusion significantly outperforms existing layout-to-image and mask-to-image methods,achieving notable improvements in both image quality and instance-level segmentation accuracy.The implementation of Poly Diffusion is available at https://github.com/YYYYYJS/PolyDiffusion(accessed on 06 August 2025).展开更多
Supervised learning-based rail fastener anomaly detection models are limited by the scarcity of anomaly samples and perform poorly under data imbalance conditions.However,unsupervised anomaly detection methods based o...Supervised learning-based rail fastener anomaly detection models are limited by the scarcity of anomaly samples and perform poorly under data imbalance conditions.However,unsupervised anomaly detection methods based on diffusion models reduce the dependence on the number of anomalous samples but suffer from too many iterations and excessive smoothing of reconstructed images.In this work,we have established a rail fastener anomaly detection framework called Diff-Fastener,the diffusion model is introduced into the fastener detection task,half of the normal samples are converted into anomaly samples online in the model training stage,and One-Step denoising and canonical guided denoising paradigms are used instead of iterative denoising to improve the reconstruction efficiency of the model while solving the problem of excessive smoothing.DACM(Dilated Attention Convolution Module)is proposed in the middle layer of the reconstruction network to increase the detail information of the reconstructed image;meanwhile,Sparse-Skip connections are used instead of dense connections to reduce the computational load of themodel and enhance its scalability.Through exhaustive experiments onMVTec,VisA,and railroad fastener datasets,the results show that Diff-Fastener achieves 99.1%Image AUROC(Area Under the Receiver Operating Characteristic)and 98.9%Pixel AUROC on the railroad fastener dataset,which outperforms the existing models and achieves the best average score on MVTec and VisA datasets.Our research provides new ideas and directions in the field of anomaly detection for rail fasteners.展开更多
Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deplo...Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.展开更多
Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges...Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.展开更多
With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of ...With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of network security,especially in identifying malicious attacks.However,due to the uneven distribution of network traffic data,particularly the imbalance between attack traffic and normal traffic,as well as the imbalance between minority class attacks and majority class attacks,traditional machine learning detection algorithms have significant limitations when dealing with sparse network traffic data.To effectively tackle this challenge,we have designed a lightweight intrusion detection model based on diffusion mechanisms,named Diff-IDS,with the core objective of enhancing the model’s efficiency in parsing complex network traffic features,thereby significantly improving its detection speed and training efficiency.The model begins by finely filtering network traffic features and converting them into grayscale images,while also employing image-flipping techniques for data augmentation.Subsequently,these preprocessed images are fed into a diffusion model based on the Unet architecture for training.Once the model is trained,we fix the weights of the Unet network and propose a feature enhancement algorithm based on feature masking to further boost the model’s expressiveness.Finally,we devise an end-to-end lightweight detection strategy to streamline the model,enabling efficient lightweight detection of imbalanced samples.Our method has been subjected to multiple experimental tests on renowned network intrusion detection benchmarks,including CICIDS 2017,KDD 99,and NSL-KDD.The experimental results indicate that Diff-IDS leads in terms of detection accuracy,training efficiency,and lightweight metrics compared to the current state-of-the-art models,demonstrating exceptional detection capabilities and robustness.展开更多
The task of molecule generation guided by specific text descriptions has been proposed to generate molecules that match given text inputs.Mainstream methods typically use simplified molecular input line entry system(S...The task of molecule generation guided by specific text descriptions has been proposed to generate molecules that match given text inputs.Mainstream methods typically use simplified molecular input line entry system(SMILES)to represent molecules and rely on diffusion models or autoregressive structures for modeling.However,the one-to-many mapping diversity when using SMILES to represent molecules causes existing methods to require complex model architectures and larger training datasets to improve performance,which affects the efficiency of model training and generation.In this paper,we propose a text-guided diverse-expression diffusion(TGDD)model for molecule generation.TGDD combines both SMILES and self-referencing embedded strings(SELFIES)into a novel diverse-expression molecular representation,enabling precise molecule mapping based on natural language.By leveraging this diverse-expression representation,TGDD simplifies the segmented diffusion generation process,achieving faster training and reduced memory consumption,while also exhibiting stronger alignment with natural language.TGDD outperforms both TGM-LDM and the autoregressive model MolT5-Base on most evaluation metrics.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.72161034).
文摘Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.
基金supported by a grant from the National Natural Science Foundation of China (NSFC) No. 81070826/30872886/30400497Sponsored by Shanghai Rising-Star Program No. 09QA1403700+1 种基金funded by Shanghai Leading Academic Discipline Project (Project Number: S30206)the Science and Technology Commission of Shanghai (08DZ2271100)
文摘Aim The purpose of this study was to develop a mathe-matical model to quantitatively describe the passive trans-port of macromolecules within dental biofilms. Methodology Fluorescently labeled dextrans with different molecular mass (3 kD,10 kD,40 kD,70 kD,2 000 kD) were used as a series of diffusion probes. Streptococcus mutans,Streptococcus sanguinis,Actinomyces naeslundii and Fusobacterium nucleatum were used as inocula for biofilm formation. The diffusion processes of different probes through the in vitro biofilm were recorded with a confocal laser microscope. Results Mathematical function of biofilm penetration was constructed on the basis of the inverse problem method. Based on this function,not only the relationship between average concentration of steady-state and molecule weights can be analyzed,but also that between penetrative time and molecule weights. Conclusion This can be used to predict the effective concentration and the penetrative time of anti-biofilm medicines that can diffuse through oral biofilm. Further-more,an improved model for large molecule is proposed by considering the exchange time at the upper boundary of the dental biofilm.
文摘In this study,the removal of monovalent and divalent cations,Nat,Kt,Mg2t,and Ca2t,in a diluted solution from Chott-El Jerid Lake,Tunisia,was investigated with the electrodialysis technique.The process was tested using two cation-exchange membranes:sulfonated polyether sulfone cross-linked with 10%hexamethylenediamine(HEXCl)and sulfonated polyether sulfone grafted with octylamine(S-PESOS).The commercially available membrane Nafion®was used for comparison.The results showed that Nafion®and S-PESOS membranes had similar removal behaviors,and the investigated cations were ranked in the following descending order in terms of their demineralization rates:Nat>Ca2t>Mg2t>Kt.Divalent cations were more effectively removed by HEXCl than by monovalent cations.The plots based on the WebereMorris model showed a strong linearity.This reveals that intra-particle diffusion was not the removal rate-determining step,and the removal process was controlled by two or more concurrent mechanisms.The Boyd plots did not pass through their origin,and the sole controlling step was determined by film-diffusion resistance,especially after a long period of electrodialysis.Additionally,a semi-empirical model was established to simulate the temporal variation of the treatment process,and the physical significance and values of model parameters were compared for the three membranes.The findings of this study indicate that HEXCl and S-PESOS membranes can be efficiently utilized for water softening,especially when effluents are highly loaded with calcium and magnesium ions.
基金partially supported by the National Natural Science Foundation of China(62271485)the SDHS Science and Technology Project(HS2023B044)
文摘Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.
基金supported by the Yonsei University graduate school Department of Integrative Biotechnology.
文摘Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited research on molecular sequence diffusion models.The International Union of Pure and Applied Chemistry(IUPAC)names are more akin to chemical natural language than the simplified molecular input line entry system(SMILES)for organic compounds.In this work,we apply an IUPAC-guided conditional diffusion model to facilitate molecular editing from chemical natural language to chemical language(SMILES)and explore whether the pre-trained generative performance of diffusion models can be transferred to chemical natural language.We propose DiffIUPAC,a controllable molecular editing diffusion model that converts IUPAC names to SMILES strings.Evaluation results demonstrate that our model out-performs existing methods and successfully captures the semantic rules of both chemical languages.Chemical space and scaffold analysis show that the model can generate similar compounds with diverse scaffolds within the specified constraints.Additionally,to illustrate the model’s applicability in drug design,we conducted case studies in functional group editing,analogue design and linker design.
基金supported by the National Natural Science Foundation of China(Nos.61906168,62202429 and 62272267)the Zhejiang Provincial Natural Science Foundation of China(No.LY23F020023)the Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(No.2022SDSJ01)。
文摘Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
基金supported by the Fundamental Research Funds for the Central Universities(No.226-2024-00038),China.
文摘The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these structures.Scientific imagery combined with artistic expression can further expand the potential of imaging in educational dissemination and interdisciplinary applications.
基金sponsored by Tsinghua-Toyota Joint Research Fund
文摘Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate model to predict the heat transfer during the casting process instead of numerical simulation.The conditional diffusion model was established and trained with the geometry shapes,initial temperature fields and temperature fields at t_(i) as the condition and random noise sampled from standard normal distribution as the input.The output was the temperature field at t_(i+1).Therefore,the temperature field at t_(i+1)can be predicted as the temperature field at t_(i) is known,and the continuous temperature fields of all the time steps can be predicted based on the initial temperature field of an arbitrary 2D geometry.A training set with 3022D shapes and their simulated temperature fields at different time steps was established.The accuracy for the temperature field for a single time step reaches 97.7%,and that for continuous time steps reaches 69.1%with the main error actually existing in the sand mold.The effect of geometry shape and initial temperature field on the prediction accuracy was investigated,the former achieves better result than the latter because the former can identify casting,mold and chill by different colors in the input images.The diffusion model has proved the potential as a surrogate model for numerical simulation of the casting process.
基金co-supported by the Aeronautical Science Foundation of China(Nos.2018ZA52002 and 2019ZA052011)。
文摘Multi-scale problems in Computational Fluid Dynamics(CFD)often require numerous simulations across various design parameters.Using a fixed mesh for all cases may fail to capture critical physical features.Moving mesh adaptation provides an optimal resource allocation to obtain high-resolution flow-fields on low-resolution meshes.However,most existing methods require manual experience and the flow posteriori information poses great challenges to practical applications.In addition,generating adaptive meshes directly from design parameters is difficult due to highly nonlinear relationships.The diffusion model is currently the most popular model in generative tasks that integrates the diffusion principle into deep learning to capture the complex nonlinear correlations.A dual diffusion framework,Para2Mesh,is proposed to predict the adaptive meshes from design parameters by exploiting the robust data distribution learning ability of the diffusion model.Through iterative denoising,the proposed dual networks accurately reconstruct the flow-field to provide flow features as supervised information,and then achieve rapid and reliable mesh movement.Experiments in CFD scenarios demonstrate that Para2Mesh predicts similar meshes directly from design parameters with much higher efficiency than traditional method.It could become a real-time adaptation tool to assist engineering design and optimization,providing a promising solution for high-resolution flow-field analysis.
基金supported in part by the National Natural Science Foundation of China(62176059)supported by The Pennsylvania State University.
文摘Finding suitable initial noise that retains the original image’s information is crucial for image-to-image(I2I)translation using text-to-image(T2I)diffusion models.A common approach is to add random noise directly to the original image,as in SDEdit.However,we have observed that this can result in“semantic discrepancy”issues,wherein T2I diffusion models misinterpret the semantic relationships and generate content not present in the original image.We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image,leading to unintended associations between unrelated regions after U-Net upsampling.Building on the widely-used latent diffusion model,Stable Diffusion,we propose a training-free,plugand-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image.By leveraging the deterministic nature of denoising diffusion implicit models(DDIMs)inversion,we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion.This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors,achieving a speedup of 11.2 times in experiments conducted on COCO,ImageNet,and ImageNet-R datasets across multiple I2I translation tasks.
基金supported by the National Natural Science Foundation of China(Nos.81974355 and 82172524)Key Research and Development Program of Hubei Province(No.2021BEA161)+2 种基金National Innovation Platform Development Program(No.2020021105012440)Open Project Funding of the Hubei Key Laboratory of Big Data Intelligent Analysis and Application,Hubei University(No.2024BDIAA03)Free Innovation Preliminary Research Fund of Wuhan Union Hospital(No.2024XHYN047).
文摘Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.
基金supported by the National Natural Science Foundation of China(Grant Nos.42005003 and 41475094)National Key R&D Program of China(Grant No.2018YFC1506704).
文摘With the growing demand for high-precision flow field simulations in computational science and engineering,the super-resolution reconstruction of physical fields has attracted considerable research interest.However,tradi-tional numerical methods often entail high computational costs,involve complex data processing,and struggle to capture fine-scale high-frequency details.To address these challenges,we propose an innovative super-resolution reconstruction framework that integrates a Fourier neural operator(FNO)with an enhanced diffusion model.The framework employs an adaptively weighted FNO to process low-resolution flow field inputs,effectively capturing global dependencies and high-frequency features.Furthermore,a residual-guided diffusion model is introduced to further improve reconstruction performance.This model uses a Markov chain for noise injection in phys-ical fields and integrates a reverse denoising procedure,efficiently solved by an adaptive time-step ordinary differential equation solver,thereby ensuring both stability and computational efficiency.Experimental results demonstrate that the proposed framework significantly outperforms existing methods in terms of accuracy and efficiency,offering a promising solution for fine-grained data reconstruction in scientific simulations.
基金supported by the Aeronautical Science Foundation of China(Nos.2018ZA52002,2019ZA052011)the National Natural Science Foundation of China(No.12472236).
文摘Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numerical simulation,which are costly and time-consuming.Therefore,it is necessary to obtain unsteady hydrodynamic performance in a low-cost and high-precision manner.Due to the strong nonlinearity,complex data distribution,and temporal characteristics of unsteady hydrodynamic performance,the prediction of it is challenging.This paper proposes a Temporal Convolutional Diffusion Model(TCDM)for predicting the unsteady hydrodynamic performance of seaplanes given design parameters.Under the framework of a classifier-free guided diffusion model,TCDM learns the distribution patterns of unsteady hydrodynamic performance data with the designed denoising module based on temporal convolutional network and captures the temporal features of unsteady hydrodynamic performance data.Using CFD simulation data,the proposed method is compared with the alternative methods to demonstrate its accuracy and generalization.This paper provides a method that enables the rapid and accurate prediction of unsteady hydrodynamic performance data,expecting to shorten the design cycle of seaplanes.
基金supported in part by the Scientific Research Fund of National Natural Science Foundation of China(Grant No.62372168)the Hunan Provincial Natural Science Foundation of China(Grant No.2023JJ30266)+2 种基金the Research Project on teaching reform in Hunan province(No.HNJG-2022-0791)the Hunan University of Science and Technology(No.2022-44-8)the National Social Science Funds of China(19BZX044).
文摘Multi-instance image generation remains a challenging task in the field of computer vision.While existing diffusionmodels demonstrate impressive fidelity in image generation,they often struggle with precisely controlling each object’s shape,pose,and size.Methods like layout-to-image and mask-to-image provide spatial guidance but frequently suffer from object shape distortion,overlaps,and poor consistency,particularly in complex scenes with multiple objects.To address these issues,we introduce PolyDiffusion,a contour-based diffusion framework that encodes each object’s contour as a boundary-coordinate sequence,decoupling object shapes and positions.This approach allows for better control over object geometry and spatial positioning,which is critical for achieving high-quality multiinstance generation.We formulate the training process as a multi-objective optimization problem,balancing three key objectives:a denoising diffusion loss to maintain overall image fidelity,a cross-attention contour alignment loss to ensure precise shape adherence,and a reward-guided denoising objective that minimizes the Fréchet distance to real images.In addition,the Object Space-Aware Attention module fuses contour tokens with visual features,while a prior-guided fusion mechanism utilizes inter-object spatial relationships and class semantics to enhance consistency across multiple objects.Experimental results on benchmark datasets such as COCO-Stuff and VOC-2012 demonstrate that PolyDiffusion significantly outperforms existing layout-to-image and mask-to-image methods,achieving notable improvements in both image quality and instance-level segmentation accuracy.The implementation of Poly Diffusion is available at https://github.com/YYYYYJS/PolyDiffusion(accessed on 06 August 2025).
基金funded by the National Natural Science Foundation of China,grant number 52272385 and 52475085.
文摘Supervised learning-based rail fastener anomaly detection models are limited by the scarcity of anomaly samples and perform poorly under data imbalance conditions.However,unsupervised anomaly detection methods based on diffusion models reduce the dependence on the number of anomalous samples but suffer from too many iterations and excessive smoothing of reconstructed images.In this work,we have established a rail fastener anomaly detection framework called Diff-Fastener,the diffusion model is introduced into the fastener detection task,half of the normal samples are converted into anomaly samples online in the model training stage,and One-Step denoising and canonical guided denoising paradigms are used instead of iterative denoising to improve the reconstruction efficiency of the model while solving the problem of excessive smoothing.DACM(Dilated Attention Convolution Module)is proposed in the middle layer of the reconstruction network to increase the detail information of the reconstructed image;meanwhile,Sparse-Skip connections are used instead of dense connections to reduce the computational load of themodel and enhance its scalability.Through exhaustive experiments onMVTec,VisA,and railroad fastener datasets,the results show that Diff-Fastener achieves 99.1%Image AUROC(Area Under the Receiver Operating Characteristic)and 98.9%Pixel AUROC on the railroad fastener dataset,which outperforms the existing models and achieves the best average score on MVTec and VisA datasets.Our research provides new ideas and directions in the field of anomaly detection for rail fasteners.
基金supported in part by the National Natural Science Foundation of China under Grants 62102450,62272478 and the Independent Research Project of a Certain Unit under Grant ZZKY20243127。
文摘Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.
基金co-supported by the National Natural Science Foundation of China(Nos.61806219,61876189 and 61703426)the Young Talent Fund of University Association for Science and Technology in Shaanxi,China(Nos.20190108 and 20220106)the Innvation Talent Supporting Project of Shaanxi,China(No.2020KJXX-065)。
文摘Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.
基金supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2024GXJS014,ZDYF2023GXJS163)the National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)Collaborative Innovation Project of Hainan University(XTCX2022XXB02).
文摘With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of network security,especially in identifying malicious attacks.However,due to the uneven distribution of network traffic data,particularly the imbalance between attack traffic and normal traffic,as well as the imbalance between minority class attacks and majority class attacks,traditional machine learning detection algorithms have significant limitations when dealing with sparse network traffic data.To effectively tackle this challenge,we have designed a lightweight intrusion detection model based on diffusion mechanisms,named Diff-IDS,with the core objective of enhancing the model’s efficiency in parsing complex network traffic features,thereby significantly improving its detection speed and training efficiency.The model begins by finely filtering network traffic features and converting them into grayscale images,while also employing image-flipping techniques for data augmentation.Subsequently,these preprocessed images are fed into a diffusion model based on the Unet architecture for training.Once the model is trained,we fix the weights of the Unet network and propose a feature enhancement algorithm based on feature masking to further boost the model’s expressiveness.Finally,we devise an end-to-end lightweight detection strategy to streamline the model,enabling efficient lightweight detection of imbalanced samples.Our method has been subjected to multiple experimental tests on renowned network intrusion detection benchmarks,including CICIDS 2017,KDD 99,and NSL-KDD.The experimental results indicate that Diff-IDS leads in terms of detection accuracy,training efficiency,and lightweight metrics compared to the current state-of-the-art models,demonstrating exceptional detection capabilities and robustness.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62476247 and 62072409)the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(Grant No.2024C01214)the Zhejiang Provincial Natural Science Foundation(Grant No.LR21F020003).
文摘The task of molecule generation guided by specific text descriptions has been proposed to generate molecules that match given text inputs.Mainstream methods typically use simplified molecular input line entry system(SMILES)to represent molecules and rely on diffusion models or autoregressive structures for modeling.However,the one-to-many mapping diversity when using SMILES to represent molecules causes existing methods to require complex model architectures and larger training datasets to improve performance,which affects the efficiency of model training and generation.In this paper,we propose a text-guided diverse-expression diffusion(TGDD)model for molecule generation.TGDD combines both SMILES and self-referencing embedded strings(SELFIES)into a novel diverse-expression molecular representation,enabling precise molecule mapping based on natural language.By leveraging this diverse-expression representation,TGDD simplifies the segmented diffusion generation process,achieving faster training and reduced memory consumption,while also exhibiting stronger alignment with natural language.TGDD outperforms both TGM-LDM and the autoregressive model MolT5-Base on most evaluation metrics.