Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse de...Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.展开更多
The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate mode...Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate model to predict the heat transfer during the casting process instead of numerical simulation.The conditional diffusion model was established and trained with the geometry shapes,initial temperature fields and temperature fields at t_(i) as the condition and random noise sampled from standard normal distribution as the input.The output was the temperature field at t_(i+1).Therefore,the temperature field at t_(i+1)can be predicted as the temperature field at t_(i) is known,and the continuous temperature fields of all the time steps can be predicted based on the initial temperature field of an arbitrary 2D geometry.A training set with 3022D shapes and their simulated temperature fields at different time steps was established.The accuracy for the temperature field for a single time step reaches 97.7%,and that for continuous time steps reaches 69.1%with the main error actually existing in the sand mold.The effect of geometry shape and initial temperature field on the prediction accuracy was investigated,the former achieves better result than the latter because the former can identify casting,mold and chill by different colors in the input images.The diffusion model has proved the potential as a surrogate model for numerical simulation of the casting process.展开更多
The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these...The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these structures.Scientific imagery combined with artistic expression can further expand the potential of imaging in educational dissemination and interdisciplinary applications.展开更多
Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has prov...Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.展开更多
Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(...Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.展开更多
The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These con...The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.展开更多
Supervised learning-based rail fastener anomaly detection models are limited by the scarcity of anomaly samples and perform poorly under data imbalance conditions.However,unsupervised anomaly detection methods based o...Supervised learning-based rail fastener anomaly detection models are limited by the scarcity of anomaly samples and perform poorly under data imbalance conditions.However,unsupervised anomaly detection methods based on diffusion models reduce the dependence on the number of anomalous samples but suffer from too many iterations and excessive smoothing of reconstructed images.In this work,we have established a rail fastener anomaly detection framework called Diff-Fastener,the diffusion model is introduced into the fastener detection task,half of the normal samples are converted into anomaly samples online in the model training stage,and One-Step denoising and canonical guided denoising paradigms are used instead of iterative denoising to improve the reconstruction efficiency of the model while solving the problem of excessive smoothing.DACM(Dilated Attention Convolution Module)is proposed in the middle layer of the reconstruction network to increase the detail information of the reconstructed image;meanwhile,Sparse-Skip connections are used instead of dense connections to reduce the computational load of themodel and enhance its scalability.Through exhaustive experiments onMVTec,VisA,and railroad fastener datasets,the results show that Diff-Fastener achieves 99.1%Image AUROC(Area Under the Receiver Operating Characteristic)and 98.9%Pixel AUROC on the railroad fastener dataset,which outperforms the existing models and achieves the best average score on MVTec and VisA datasets.Our research provides new ideas and directions in the field of anomaly detection for rail fasteners.展开更多
Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deplo...Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.展开更多
The task of molecule generation guided by specific text descriptions has been proposed to generate molecules that match given text inputs.Mainstream methods typically use simplified molecular input line entry system(S...The task of molecule generation guided by specific text descriptions has been proposed to generate molecules that match given text inputs.Mainstream methods typically use simplified molecular input line entry system(SMILES)to represent molecules and rely on diffusion models or autoregressive structures for modeling.However,the one-to-many mapping diversity when using SMILES to represent molecules causes existing methods to require complex model architectures and larger training datasets to improve performance,which affects the efficiency of model training and generation.In this paper,we propose a text-guided diverse-expression diffusion(TGDD)model for molecule generation.TGDD combines both SMILES and self-referencing embedded strings(SELFIES)into a novel diverse-expression molecular representation,enabling precise molecule mapping based on natural language.By leveraging this diverse-expression representation,TGDD simplifies the segmented diffusion generation process,achieving faster training and reduced memory consumption,while also exhibiting stronger alignment with natural language.TGDD outperforms both TGM-LDM and the autoregressive model MolT5-Base on most evaluation metrics.展开更多
Federated semi-supervised learning(FSSL)faces two major challenges:the scarcity of labeled data across clients and the non-independent and identically distributed(Non-IID)nature of data among clients.To address these ...Federated semi-supervised learning(FSSL)faces two major challenges:the scarcity of labeled data across clients and the non-independent and identically distributed(Non-IID)nature of data among clients.To address these issues,we propose diffusion model-based data synthesis aided FSSL(DDSA-FSSL),a novel approach that leverages diffusion model(DM)to generate synthetic data,thereby bridging the gap between heterogeneous local data distributions and the global data distribution.In the proposed DDSA-FSSL,each client addresses the scarcity of labeled data by utilizing a federated learningtrained classifier to perform pseudo labeling for unlabeled data.The DM is then collaboratively trained using both labeled and precision-optimized pseudolabeled data,enabling clients to generate synthetic samples for classes that are absent in their labeled datasets.As a result,the disparity between local and global distributions is reduced and clients can create enriched synthetic datasets that better align with the global data distribution.Extensive experiments on various datasets and Non-IID scenarios demonstrate the effectiveness of DDSA-FSSL,achieving significant performance improvements,such as increasing accuracy from 38.46%to 52.14%on CIFAR-10 datasets with 10%labeled data.展开更多
Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges...Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.展开更多
With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of ...With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of network security,especially in identifying malicious attacks.However,due to the uneven distribution of network traffic data,particularly the imbalance between attack traffic and normal traffic,as well as the imbalance between minority class attacks and majority class attacks,traditional machine learning detection algorithms have significant limitations when dealing with sparse network traffic data.To effectively tackle this challenge,we have designed a lightweight intrusion detection model based on diffusion mechanisms,named Diff-IDS,with the core objective of enhancing the model’s efficiency in parsing complex network traffic features,thereby significantly improving its detection speed and training efficiency.The model begins by finely filtering network traffic features and converting them into grayscale images,while also employing image-flipping techniques for data augmentation.Subsequently,these preprocessed images are fed into a diffusion model based on the Unet architecture for training.Once the model is trained,we fix the weights of the Unet network and propose a feature enhancement algorithm based on feature masking to further boost the model’s expressiveness.Finally,we devise an end-to-end lightweight detection strategy to streamline the model,enabling efficient lightweight detection of imbalanced samples.Our method has been subjected to multiple experimental tests on renowned network intrusion detection benchmarks,including CICIDS 2017,KDD 99,and NSL-KDD.The experimental results indicate that Diff-IDS leads in terms of detection accuracy,training efficiency,and lightweight metrics compared to the current state-of-the-art models,demonstrating exceptional detection capabilities and robustness.展开更多
AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,com...AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.展开更多
Effects of ultrasonic bonding parameters on atomic diffusion, microstructure at the Al-Au interface, and shear strength of Al-Au ultrasonic bonding were investigated by the combining experiments and finite element (FE...Effects of ultrasonic bonding parameters on atomic diffusion, microstructure at the Al-Au interface, and shear strength of Al-Au ultrasonic bonding were investigated by the combining experiments and finite element (FE) simulation. The quantitative model of atomic diffusion, which is related to the ultrasonic bonding parameters, time and distance, is established to calculate the atomic diffusion of the Al-Au interface. The maximum relative error between the calculated and experimental fraction of Al atom is 7.35%, indicating high prediction accuracy of this model. During the process of ultrasonic bonding, Au8Al3 is the main intermetallic compound (IMC) at the Al-Au interface. With larger bonding forces, higher ultrasonic powers and longer bonding time, it is more difficult to remove the oxide particles from the Al-Au interface, which hinders the atomic diffusion. Therefore, the complicated stress state and the existence of oxide particles both promotes the formation of holes. The shear strength of Al-Au ultrasonic bonding increases with increasing bonding force, ultrasonic power and bonding time. However, combined with the presence of holes at especial parameters, the optimal ultrasonic bonding parameter is confirmed to be a bonding force of 23 gf, ultrasonic power of 75 mW and bonding time of 21 ms.展开更多
High resolution of post-stack seismic data assists in better interpretation of subsurface structures as well as high accuracy of impedance inversion. Therefore, geophysicists consistently strive to acquire higher reso...High resolution of post-stack seismic data assists in better interpretation of subsurface structures as well as high accuracy of impedance inversion. Therefore, geophysicists consistently strive to acquire higher resolution seismic images in petroleum exploration. Although there have been successful applications of conventional signal processing and machine learning for post-stack seismic resolution enhancement,there is limited reference to the seismic applications of the recent emergence and rapid development of generative artificial intelligence. Hence, we propose to apply diffusion models, among the most popular generative models, to enhance seismic resolution. Specifically, we apply the classic diffusion model—denoising diffusion probabilistic model(DDPM), conditioned on the seismic data in low resolution, to reconstruct corresponding high-resolution images. Herein the entire scheme is referred to as SeisResoDiff. To provide a comprehensive and clear understanding of SeisResoDiff, we introduce the basic theories of diffusion models and detail the optimization objective's derivation with the aid of diagrams and algorithms. For implementation, we first propose a practical workflow to acquire abundant training data based on the generated pseudo-wells. Subsequently, we apply the trained model to both synthetic and field datasets, evaluating the results in three aspects: the appearance of seismic sections and slices in the time domain, frequency spectra, and comparisons with the synthetic data using real well-logging data at the well locations. The results demonstrate not only effective seismic resolution enhancement,but also additional denoising by the diffusion model. Experimental comparisons indicate that training the model on noisy data, which are more realistic, outperforms training on clean data. The proposed scheme demonstrates superiority over some conventional methods in high-resolution reconstruction and denoising ability, yielding more competitive results compared to our previous research.展开更多
With the rapid growth of manuscript submissions,finding eligible reviewers for every submission has become a heavy task.Recommender systems are powerful tools developed in computer science and information science to d...With the rapid growth of manuscript submissions,finding eligible reviewers for every submission has become a heavy task.Recommender systems are powerful tools developed in computer science and information science to deal with this problem.However,most existing approaches resort to text mining techniques to match manuscripts with potential reviewers,which require high-quality textual information to perform well.In this paper,we propose a reviewer recommendation algorithm based on a network diffusion process on a scholar-paper multilayer network,with no requirement for textual information.The network incorporates the relationship of scholar-paper pairs,the collaboration among scholars,and the bibliographic coupling among papers.Experimental results show that our proposed algorithm outperforms other state-of-the-art recommendation methods that use graph random walk and matrix factorization and methods that use machine learning and natural language processing,with improvements of over 7.62%in recall,5.66%in hit rate,and 47.53%in ranking score.Our work sheds light on the effectiveness of multilayer network diffusion-based methods in the reviewer recommendation problem,which will help to facilitate the peer-review process and promote information retrieval research in other practical scenes.展开更多
In this study,the fluid flow and mixing process in an impinging stream-rotating packed bed(IS-RPB)is simulated by using a new three-dimensional computational fluid dynamics model.Specifically,the gaseliquid flow is si...In this study,the fluid flow and mixing process in an impinging stream-rotating packed bed(IS-RPB)is simulated by using a new three-dimensional computational fluid dynamics model.Specifically,the gaseliquid flow is simulated by the Euler-Euler model,the hydrodynamics of the reactor is predicted by the RNG k-εmethod,and the high-gravity environment is simulated by the sliding mesh model.The turbulent mass transfer process is characterized by the concentration variance c^(2) and its dissipation rateεc formulations,and therefore the turbulent mass diffusivity can be directly obtained.The simulated segregation index Xs is in agreement with our previous experimental results.The simulated results reveal that the fringe effect of IS can be offset by the end effect at the inner radius of RPB,so the investigation of the coupling mechanism between IS and RPB is critical to intensify the mixing process in IS-RPB.展开更多
Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by ut...Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.展开更多
We present a new Dirichlet boundary condition for the rate-type non-Newtonian diffusive constitutive models. The newly proposed boundary condition is compared with two such well-known and popularly used boundary condi...We present a new Dirichlet boundary condition for the rate-type non-Newtonian diffusive constitutive models. The newly proposed boundary condition is compared with two such well-known and popularly used boundary conditions as the pure Neumann condition and the Dirichlet condition by Sureshkumar and Beris. Our condition is demonstrated to be more stable and robust in a number of numerical test cases. A new Dirichlet boundary condition is implemented in the framework of the finite difference Marker and Cell (MAC) method. In this paper, we also present an energy-stable finite difference MAC scheme that preserves the positivity for the conformation tensor and show how the addition of the diffusion helps the energy-stability in a finite difference MAC scheme-setting.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61906168,62202429 and 62272267)the Zhejiang Provincial Natural Science Foundation of China(No.LY23F020023)the Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(No.2022SDSJ01)。
文摘Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
基金sponsored by Tsinghua-Toyota Joint Research Fund
文摘Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate model to predict the heat transfer during the casting process instead of numerical simulation.The conditional diffusion model was established and trained with the geometry shapes,initial temperature fields and temperature fields at t_(i) as the condition and random noise sampled from standard normal distribution as the input.The output was the temperature field at t_(i+1).Therefore,the temperature field at t_(i+1)can be predicted as the temperature field at t_(i) is known,and the continuous temperature fields of all the time steps can be predicted based on the initial temperature field of an arbitrary 2D geometry.A training set with 3022D shapes and their simulated temperature fields at different time steps was established.The accuracy for the temperature field for a single time step reaches 97.7%,and that for continuous time steps reaches 69.1%with the main error actually existing in the sand mold.The effect of geometry shape and initial temperature field on the prediction accuracy was investigated,the former achieves better result than the latter because the former can identify casting,mold and chill by different colors in the input images.The diffusion model has proved the potential as a surrogate model for numerical simulation of the casting process.
基金supported by the Fundamental Research Funds for the Central Universities(No.226-2024-00038),China.
文摘The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these structures.Scientific imagery combined with artistic expression can further expand the potential of imaging in educational dissemination and interdisciplinary applications.
基金partially supported by the National Natural Science Foundation of China(62271485)the SDHS Science and Technology Project(HS2023B044)
文摘Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.
基金supported by the National Natural Science Foundation of China(Nos.81974355 and 82172524)Key Research and Development Program of Hubei Province(No.2021BEA161)+2 种基金National Innovation Platform Development Program(No.2020021105012440)Open Project Funding of the Hubei Key Laboratory of Big Data Intelligent Analysis and Application,Hubei University(No.2024BDIAA03)Free Innovation Preliminary Research Fund of Wuhan Union Hospital(No.2024XHYN047).
文摘Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.
基金supported in part by NSFC under Grant Nos.62402379,U22A2029 and U24A20237.
文摘The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.
基金funded by the National Natural Science Foundation of China,grant number 52272385 and 52475085.
文摘Supervised learning-based rail fastener anomaly detection models are limited by the scarcity of anomaly samples and perform poorly under data imbalance conditions.However,unsupervised anomaly detection methods based on diffusion models reduce the dependence on the number of anomalous samples but suffer from too many iterations and excessive smoothing of reconstructed images.In this work,we have established a rail fastener anomaly detection framework called Diff-Fastener,the diffusion model is introduced into the fastener detection task,half of the normal samples are converted into anomaly samples online in the model training stage,and One-Step denoising and canonical guided denoising paradigms are used instead of iterative denoising to improve the reconstruction efficiency of the model while solving the problem of excessive smoothing.DACM(Dilated Attention Convolution Module)is proposed in the middle layer of the reconstruction network to increase the detail information of the reconstructed image;meanwhile,Sparse-Skip connections are used instead of dense connections to reduce the computational load of themodel and enhance its scalability.Through exhaustive experiments onMVTec,VisA,and railroad fastener datasets,the results show that Diff-Fastener achieves 99.1%Image AUROC(Area Under the Receiver Operating Characteristic)and 98.9%Pixel AUROC on the railroad fastener dataset,which outperforms the existing models and achieves the best average score on MVTec and VisA datasets.Our research provides new ideas and directions in the field of anomaly detection for rail fasteners.
基金supported in part by the National Natural Science Foundation of China under Grants 62102450,62272478 and the Independent Research Project of a Certain Unit under Grant ZZKY20243127。
文摘Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62476247 and 62072409)the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(Grant No.2024C01214)the Zhejiang Provincial Natural Science Foundation(Grant No.LR21F020003).
文摘The task of molecule generation guided by specific text descriptions has been proposed to generate molecules that match given text inputs.Mainstream methods typically use simplified molecular input line entry system(SMILES)to represent molecules and rely on diffusion models or autoregressive structures for modeling.However,the one-to-many mapping diversity when using SMILES to represent molecules causes existing methods to require complex model architectures and larger training datasets to improve performance,which affects the efficiency of model training and generation.In this paper,we propose a text-guided diverse-expression diffusion(TGDD)model for molecule generation.TGDD combines both SMILES and self-referencing embedded strings(SELFIES)into a novel diverse-expression molecular representation,enabling precise molecule mapping based on natural language.By leveraging this diverse-expression representation,TGDD simplifies the segmented diffusion generation process,achieving faster training and reduced memory consumption,while also exhibiting stronger alignment with natural language.TGDD outperforms both TGM-LDM and the autoregressive model MolT5-Base on most evaluation metrics.
基金supported in part by NSF of China under Grant 62222111 and Grant 62431015in part by the Science and Technology Commission Foundation of Shanghai under Grant 24DP1500702.
文摘Federated semi-supervised learning(FSSL)faces two major challenges:the scarcity of labeled data across clients and the non-independent and identically distributed(Non-IID)nature of data among clients.To address these issues,we propose diffusion model-based data synthesis aided FSSL(DDSA-FSSL),a novel approach that leverages diffusion model(DM)to generate synthetic data,thereby bridging the gap between heterogeneous local data distributions and the global data distribution.In the proposed DDSA-FSSL,each client addresses the scarcity of labeled data by utilizing a federated learningtrained classifier to perform pseudo labeling for unlabeled data.The DM is then collaboratively trained using both labeled and precision-optimized pseudolabeled data,enabling clients to generate synthetic samples for classes that are absent in their labeled datasets.As a result,the disparity between local and global distributions is reduced and clients can create enriched synthetic datasets that better align with the global data distribution.Extensive experiments on various datasets and Non-IID scenarios demonstrate the effectiveness of DDSA-FSSL,achieving significant performance improvements,such as increasing accuracy from 38.46%to 52.14%on CIFAR-10 datasets with 10%labeled data.
基金co-supported by the National Natural Science Foundation of China(Nos.61806219,61876189 and 61703426)the Young Talent Fund of University Association for Science and Technology in Shaanxi,China(Nos.20190108 and 20220106)the Innvation Talent Supporting Project of Shaanxi,China(No.2020KJXX-065)。
文摘Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.
基金supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2024GXJS014,ZDYF2023GXJS163)the National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)Collaborative Innovation Project of Hainan University(XTCX2022XXB02).
文摘With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of network security,especially in identifying malicious attacks.However,due to the uneven distribution of network traffic data,particularly the imbalance between attack traffic and normal traffic,as well as the imbalance between minority class attacks and majority class attacks,traditional machine learning detection algorithms have significant limitations when dealing with sparse network traffic data.To effectively tackle this challenge,we have designed a lightweight intrusion detection model based on diffusion mechanisms,named Diff-IDS,with the core objective of enhancing the model’s efficiency in parsing complex network traffic features,thereby significantly improving its detection speed and training efficiency.The model begins by finely filtering network traffic features and converting them into grayscale images,while also employing image-flipping techniques for data augmentation.Subsequently,these preprocessed images are fed into a diffusion model based on the Unet architecture for training.Once the model is trained,we fix the weights of the Unet network and propose a feature enhancement algorithm based on feature masking to further boost the model’s expressiveness.Finally,we devise an end-to-end lightweight detection strategy to streamline the model,enabling efficient lightweight detection of imbalanced samples.Our method has been subjected to multiple experimental tests on renowned network intrusion detection benchmarks,including CICIDS 2017,KDD 99,and NSL-KDD.The experimental results indicate that Diff-IDS leads in terms of detection accuracy,training efficiency,and lightweight metrics compared to the current state-of-the-art models,demonstrating exceptional detection capabilities and robustness.
基金supported by the Key Project of International Cooperation of Qilu University of Technology(Grant No.:QLUTGJHZ2018008)Shandong Provincial Natural Science Foundation Committee,China(Grant No.:ZR2016HB54)Shandong Provincial Key Laboratory of Microbial Engineering(SME).
文摘AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.
基金Project(2022YFB3707201) supported by the National Key R&D Program of ChinaProject(U2341254) supported by the Ye Qisun Science Foundation of National Natural Science Foundation of China+1 种基金Projects(0604022GH0202143,0604022SH0201143) supported by the NPU Aoxiang Distinguished Young Scholars,ChinaProject supported by the Funding of Young Top-notch Talent of the National Ten Thousand Talent Program,China。
文摘Effects of ultrasonic bonding parameters on atomic diffusion, microstructure at the Al-Au interface, and shear strength of Al-Au ultrasonic bonding were investigated by the combining experiments and finite element (FE) simulation. The quantitative model of atomic diffusion, which is related to the ultrasonic bonding parameters, time and distance, is established to calculate the atomic diffusion of the Al-Au interface. The maximum relative error between the calculated and experimental fraction of Al atom is 7.35%, indicating high prediction accuracy of this model. During the process of ultrasonic bonding, Au8Al3 is the main intermetallic compound (IMC) at the Al-Au interface. With larger bonding forces, higher ultrasonic powers and longer bonding time, it is more difficult to remove the oxide particles from the Al-Au interface, which hinders the atomic diffusion. Therefore, the complicated stress state and the existence of oxide particles both promotes the formation of holes. The shear strength of Al-Au ultrasonic bonding increases with increasing bonding force, ultrasonic power and bonding time. However, combined with the presence of holes at especial parameters, the optimal ultrasonic bonding parameter is confirmed to be a bonding force of 23 gf, ultrasonic power of 75 mW and bonding time of 21 ms.
基金supported by the National Natural Science Foundation of China (NSFC): Grant number 42274147。
文摘High resolution of post-stack seismic data assists in better interpretation of subsurface structures as well as high accuracy of impedance inversion. Therefore, geophysicists consistently strive to acquire higher resolution seismic images in petroleum exploration. Although there have been successful applications of conventional signal processing and machine learning for post-stack seismic resolution enhancement,there is limited reference to the seismic applications of the recent emergence and rapid development of generative artificial intelligence. Hence, we propose to apply diffusion models, among the most popular generative models, to enhance seismic resolution. Specifically, we apply the classic diffusion model—denoising diffusion probabilistic model(DDPM), conditioned on the seismic data in low resolution, to reconstruct corresponding high-resolution images. Herein the entire scheme is referred to as SeisResoDiff. To provide a comprehensive and clear understanding of SeisResoDiff, we introduce the basic theories of diffusion models and detail the optimization objective's derivation with the aid of diagrams and algorithms. For implementation, we first propose a practical workflow to acquire abundant training data based on the generated pseudo-wells. Subsequently, we apply the trained model to both synthetic and field datasets, evaluating the results in three aspects: the appearance of seismic sections and slices in the time domain, frequency spectra, and comparisons with the synthetic data using real well-logging data at the well locations. The results demonstrate not only effective seismic resolution enhancement,but also additional denoising by the diffusion model. Experimental comparisons indicate that training the model on noisy data, which are more realistic, outperforms training on clean data. The proposed scheme demonstrates superiority over some conventional methods in high-resolution reconstruction and denoising ability, yielding more competitive results compared to our previous research.
基金Project supported by the National Natural Science Foundation of China(Grant No.T2293771)the New Cornerstone Science Foundation through the XPLORER PRIZE.
文摘With the rapid growth of manuscript submissions,finding eligible reviewers for every submission has become a heavy task.Recommender systems are powerful tools developed in computer science and information science to deal with this problem.However,most existing approaches resort to text mining techniques to match manuscripts with potential reviewers,which require high-quality textual information to perform well.In this paper,we propose a reviewer recommendation algorithm based on a network diffusion process on a scholar-paper multilayer network,with no requirement for textual information.The network incorporates the relationship of scholar-paper pairs,the collaboration among scholars,and the bibliographic coupling among papers.Experimental results show that our proposed algorithm outperforms other state-of-the-art recommendation methods that use graph random walk and matrix factorization and methods that use machine learning and natural language processing,with improvements of over 7.62%in recall,5.66%in hit rate,and 47.53%in ranking score.Our work sheds light on the effectiveness of multilayer network diffusion-based methods in the reviewer recommendation problem,which will help to facilitate the peer-review process and promote information retrieval research in other practical scenes.
基金supported by the National Natural Science Foundation of China (22208328, 22378370 and 22108261)Fundamental Research Program of Shanxi Province(20210302124618)
文摘In this study,the fluid flow and mixing process in an impinging stream-rotating packed bed(IS-RPB)is simulated by using a new three-dimensional computational fluid dynamics model.Specifically,the gaseliquid flow is simulated by the Euler-Euler model,the hydrodynamics of the reactor is predicted by the RNG k-εmethod,and the high-gravity environment is simulated by the sliding mesh model.The turbulent mass transfer process is characterized by the concentration variance c^(2) and its dissipation rateεc formulations,and therefore the turbulent mass diffusivity can be directly obtained.The simulated segregation index Xs is in agreement with our previous experimental results.The simulated results reveal that the fringe effect of IS can be offset by the end effect at the inner radius of RPB,so the investigation of the coupling mechanism between IS and RPB is critical to intensify the mixing process in IS-RPB.
基金supported in part by the National Natural Science Foundation of China under Grants 61502162,61702175,and 61772184in part by the Fund of the State Key Laboratory of Geo-information Engineering under Grant SKLGIE2016-M-4-2+4 种基金in part by the Hunan Natural Science Foundation of China under Grant 2018JJ2059in part by the Key R&D Project of Hunan Province of China under Grant 2018GK2014in part by the Open Fund of the State Key Laboratory of Integrated Services Networks under Grant ISN17-14Chinese Scholarship Council(CSC)through College of Computer Science and Electronic Engineering,Changsha,410082Hunan University with Grant CSC No.2018GXZ020784.
文摘Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.
文摘We present a new Dirichlet boundary condition for the rate-type non-Newtonian diffusive constitutive models. The newly proposed boundary condition is compared with two such well-known and popularly used boundary conditions as the pure Neumann condition and the Dirichlet condition by Sureshkumar and Beris. Our condition is demonstrated to be more stable and robust in a number of numerical test cases. A new Dirichlet boundary condition is implemented in the framework of the finite difference Marker and Cell (MAC) method. In this paper, we also present an energy-stable finite difference MAC scheme that preserves the positivity for the conformation tensor and show how the addition of the diffusion helps the energy-stability in a finite difference MAC scheme-setting.