In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic h...In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic heterogeneous infrastructures,unstable links and non-uniform hardware capabilities create critical issues regarding security and privacy.Traditional protocols are often too computationally heavy to allow 6G services to achieve their expected Quality-of-Service(QoS).As the transport network is built of ad hoc nodes,there is no guarantee about their trustworthiness or behavior,and transversal functionalities are delegated to the extreme nodes.However,while security can be guaranteed in extreme-to-extreme solutions,privacy cannot,as all intermediate nodes still have to handle the data packets they are transporting.Besides,traditional schemes for private anonymous ad hoc communications are vulnerable against modern intelligent attacks based on learning models.The proposed scheme fulfills this gap.Findings show the probability of a successful intelligent attack reduces by up to 65%compared to ad hoc networks with no privacy protection strategy when used the proposed technology.While congestion probability can remain below 0.001%,as required in 6G services.展开更多
The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-super...The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.展开更多
Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and ...Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.展开更多
Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frame...Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.展开更多
With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard...With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard)problems such as the Traveling Salesman Problem(TSP).However,existing diffusion model-based solvers typically employ a fixed,uniform noise schedule(e.g.,linear or cosine annealing)across all training instances,failing to fully account for the unique characteristics of each problem instance.To address this challenge,we present GraphGuided Diffusion Solvers(GGDS),an enhanced method for improving graph-based diffusion models.GGDS leverages Graph Neural Networks(GNNs)to capture graph structural information embedded in node coordinates and adjacency matrices,dynamically adjusting the noise levels in the diffusion model.This study investigates the TSP by examining two distinct time-step noise generation strategies:cosine annealing and a Neural Network(NN)-based approach.We evaluate their performance across different problem scales,particularly after integrating graph structural information.Experimental results indicate that GGDS outperforms previous methods with average performance improvements of 18.7%,6.3%,and 88.7%on TSP-500,TSP-100,and TSP-50,respectively.Specifically,GGDS demonstrates superior performance on TSP-500 and TSP-50,while its performance on TSP-100 is either comparable to or slightly better than that of previous methods,depending on the chosen noise schedule and decoding strategy.展开更多
Solute diffusion controlled solidification model was used to simulate the initial stage cellular to dendrite transition of Ti44Al alloys during directional solidification at different velocities. The simulation result...Solute diffusion controlled solidification model was used to simulate the initial stage cellular to dendrite transition of Ti44Al alloys during directional solidification at different velocities. The simulation results show that during this process, a mixed structure composed of cells and dendrites was observed, where secondary dendrites are absent at facing surface with parallel closely spaced dendrites, which agrees with the previous experimental observation. The dendrite spacings are larger than cellular spacings at a given rate, and the columnar grain spacing sharply increases to a maximum as solidification advance to coexistence zone. In addition, simulation also revealed that decreasing the numbers of the seed causes the trend of unstable dendrite transition to increase. Finally, the main influence factors affecting cell/dendrite transition were analyzed, which could be the change of growth rates resulting in slight fluctuations of liquid composition occurred at growth front. The simulation results are in reasonable agreement with the results of previous theoretical models and experimental observation at low cooling rates.展开更多
Aim The purpose of this study was to develop a mathe-matical model to quantitatively describe the passive trans-port of macromolecules within dental biofilms. Methodology Fluorescently labeled dextrans with different ...Aim The purpose of this study was to develop a mathe-matical model to quantitatively describe the passive trans-port of macromolecules within dental biofilms. Methodology Fluorescently labeled dextrans with different molecular mass (3 kD,10 kD,40 kD,70 kD,2 000 kD) were used as a series of diffusion probes. Streptococcus mutans,Streptococcus sanguinis,Actinomyces naeslundii and Fusobacterium nucleatum were used as inocula for biofilm formation. The diffusion processes of different probes through the in vitro biofilm were recorded with a confocal laser microscope. Results Mathematical function of biofilm penetration was constructed on the basis of the inverse problem method. Based on this function,not only the relationship between average concentration of steady-state and molecule weights can be analyzed,but also that between penetrative time and molecule weights. Conclusion This can be used to predict the effective concentration and the penetrative time of anti-biofilm medicines that can diffuse through oral biofilm. Further-more,an improved model for large molecule is proposed by considering the exchange time at the upper boundary of the dental biofilm.展开更多
Polyvinylpyrrolidone K-30(PVP) was introduced into the preparation of nanozero-valent iron(n ZVI) and the traditional liquid-phase reduction was improved. The introduction of PVP simplified the traditional method.The ...Polyvinylpyrrolidone K-30(PVP) was introduced into the preparation of nanozero-valent iron(n ZVI) and the traditional liquid-phase reduction was improved. The introduction of PVP simplified the traditional method.The n ZVI prepared with this new approach showed excellent surface characters and high performance on the removal of cadmium. TEM results showed that the aggregates of n ZVI can reach to several micrometers in length but less than 100 nm in diameter. The iron particles that were enclosed by a layer of oxide film that is less than10 nm, demonstrated that the n ZVI possesses a core–shell structure. BET results indicate that the specific surface area of the n ZVI was 20.3159 m^2g^(-1). A three factor and three level orthogonal experiment was employed to find out the dominant factor that affects the removal rate of cadmium by n ZVI. Based on the range values, the prominence order of each factor was: initial p H of the solution N initial concentration of cadmium N dosage of n ZVI, the range was 96.453, 3.294 and 1.747, respectively. A simulation was performed under the same condition and a same conclusion was derived, this consistence confirmed the validity of the conclusion that p H is the most significant factor that affects the adsorption efficiency.展开更多
In this study,the removal of monovalent and divalent cations,Nat,Kt,Mg2t,and Ca2t,in a diluted solution from Chott-El Jerid Lake,Tunisia,was investigated with the electrodialysis technique.The process was tested using...In this study,the removal of monovalent and divalent cations,Nat,Kt,Mg2t,and Ca2t,in a diluted solution from Chott-El Jerid Lake,Tunisia,was investigated with the electrodialysis technique.The process was tested using two cation-exchange membranes:sulfonated polyether sulfone cross-linked with 10%hexamethylenediamine(HEXCl)and sulfonated polyether sulfone grafted with octylamine(S-PESOS).The commercially available membrane Nafion®was used for comparison.The results showed that Nafion®and S-PESOS membranes had similar removal behaviors,and the investigated cations were ranked in the following descending order in terms of their demineralization rates:Nat>Ca2t>Mg2t>Kt.Divalent cations were more effectively removed by HEXCl than by monovalent cations.The plots based on the WebereMorris model showed a strong linearity.This reveals that intra-particle diffusion was not the removal rate-determining step,and the removal process was controlled by two or more concurrent mechanisms.The Boyd plots did not pass through their origin,and the sole controlling step was determined by film-diffusion resistance,especially after a long period of electrodialysis.Additionally,a semi-empirical model was established to simulate the temporal variation of the treatment process,and the physical significance and values of model parameters were compared for the three membranes.The findings of this study indicate that HEXCl and S-PESOS membranes can be efficiently utilized for water softening,especially when effluents are highly loaded with calcium and magnesium ions.展开更多
The inner surface modification process by plasma-based low-energy ion implantation(PBLEII)with an electron cyclotron resonance(ECR)microwave plasma source located at the central axis of a cylindrical tube is model...The inner surface modification process by plasma-based low-energy ion implantation(PBLEII)with an electron cyclotron resonance(ECR)microwave plasma source located at the central axis of a cylindrical tube is modeled to optimize the low-energy ion implantation parameters for industrial applications.In this paper,a magnetized plasma diffusion fluid model has been established to describe the plasma nonuniformity caused by plasma diffusion under an axial magnetic field during the pulse-off time of low pulsed negative bias.Using this plasma density distribution as the initial condition,a sheath collisional fluid model is built up to describe the sheath evolution and ion implantation during the pulse-on time.The plasma nonuniformity at the end of the pulse-off time is more apparent along the radial direction compared with that in the axial direction due to the geometry of the linear plasma source in the center and the difference between perpendicular and parallel plasma diffusion coefficients with respect to the magnetic field.The normalized nitrogen plasma densities on the inner and outer surfaces of the tube are observed to be about 0.39 and 0.24,respectively,of which the value is 1 at the central plasma source.After a 5μs pulse-on time,in the area less than 2 cm from the end of the tube,the nitrogen ion implantation energy decreases from 1.5 keV to 1.3 keV and the ion implantation angle increases from several degrees to more than 40°;both variations reduce the nitrogen ion implantation depth.However,the nitrogen ion implantation dose peaks of about 2×10^(10)-7×10^(10)ions/cm^2 in this area are 2-4 times higher than that of 1.18×10^(10)ions/cm^2 and 1.63×10^(10)ions/cm^2 on the inner and outer surfaces of the tube.The sufficient ion implantation dose ensures an acceptable modification effect near the end of the tube under the low energy and large angle conditions for nitrogen ion implantation,because the modification effect is mainly determined by the ion implantation dose,just as the mass transfer process in PBLEII is dominated by low-energy ion implantation and thermal diffusion.Therefore,a comparatively uniform surface modification by the low-energy nitrogen ion implantation is achieved along the cylindrical tube on both the inner and outer surfaces.展开更多
Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse de...Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.展开更多
Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has prov...Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.展开更多
Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited ...Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited research on molecular sequence diffusion models.The International Union of Pure and Applied Chemistry(IUPAC)names are more akin to chemical natural language than the simplified molecular input line entry system(SMILES)for organic compounds.In this work,we apply an IUPAC-guided conditional diffusion model to facilitate molecular editing from chemical natural language to chemical language(SMILES)and explore whether the pre-trained generative performance of diffusion models can be transferred to chemical natural language.We propose DiffIUPAC,a controllable molecular editing diffusion model that converts IUPAC names to SMILES strings.Evaluation results demonstrate that our model out-performs existing methods and successfully captures the semantic rules of both chemical languages.Chemical space and scaffold analysis show that the model can generate similar compounds with diverse scaffolds within the specified constraints.Additionally,to illustrate the model’s applicability in drug design,we conducted case studies in functional group editing,analogue design and linker design.展开更多
With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of ...With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of network security,especially in identifying malicious attacks.However,due to the uneven distribution of network traffic data,particularly the imbalance between attack traffic and normal traffic,as well as the imbalance between minority class attacks and majority class attacks,traditional machine learning detection algorithms have significant limitations when dealing with sparse network traffic data.To effectively tackle this challenge,we have designed a lightweight intrusion detection model based on diffusion mechanisms,named Diff-IDS,with the core objective of enhancing the model’s efficiency in parsing complex network traffic features,thereby significantly improving its detection speed and training efficiency.The model begins by finely filtering network traffic features and converting them into grayscale images,while also employing image-flipping techniques for data augmentation.Subsequently,these preprocessed images are fed into a diffusion model based on the Unet architecture for training.Once the model is trained,we fix the weights of the Unet network and propose a feature enhancement algorithm based on feature masking to further boost the model’s expressiveness.Finally,we devise an end-to-end lightweight detection strategy to streamline the model,enabling efficient lightweight detection of imbalanced samples.Our method has been subjected to multiple experimental tests on renowned network intrusion detection benchmarks,including CICIDS 2017,KDD 99,and NSL-KDD.The experimental results indicate that Diff-IDS leads in terms of detection accuracy,training efficiency,and lightweight metrics compared to the current state-of-the-art models,demonstrating exceptional detection capabilities and robustness.展开更多
Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges...Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.展开更多
The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate mode...Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate model to predict the heat transfer during the casting process instead of numerical simulation.The conditional diffusion model was established and trained with the geometry shapes,initial temperature fields and temperature fields at t_(i) as the condition and random noise sampled from standard normal distribution as the input.The output was the temperature field at t_(i+1).Therefore,the temperature field at t_(i+1)can be predicted as the temperature field at t_(i) is known,and the continuous temperature fields of all the time steps can be predicted based on the initial temperature field of an arbitrary 2D geometry.A training set with 3022D shapes and their simulated temperature fields at different time steps was established.The accuracy for the temperature field for a single time step reaches 97.7%,and that for continuous time steps reaches 69.1%with the main error actually existing in the sand mold.The effect of geometry shape and initial temperature field on the prediction accuracy was investigated,the former achieves better result than the latter because the former can identify casting,mold and chill by different colors in the input images.The diffusion model has proved the potential as a surrogate model for numerical simulation of the casting process.展开更多
Although existing style transfer techniques have made significant progress in the field of image generation,there are still some challenges in the field of exhibition hall design.The existing style transfer methods ma...Although existing style transfer techniques have made significant progress in the field of image generation,there are still some challenges in the field of exhibition hall design.The existing style transfer methods mainly focus on the transformation of single dimensional features,but ignore the deep integration of content and style features in exhibition hall design.In addition,existing methods are deficient in detail retention,especially in accurately capturing and reproducing local textures and details while preserving the content image structure.In addition,point-based attention mechanisms tend to ignore the complexity and diversity of image features in multi-dimensional space,resulting in alignment problems between features in different semantic areas,resulting in inconsistent stylistic features in content areas.In this context,this paper proposes a semantic-enhanced multimodal style transfer algorithm tailored for exhibition hall design.The proposed approach leverages a multimodal encoder architecture to integrate information from text,source images,and style images,using separate encoder modules for each modality to capture shallow,deep,and semantic features.A novel Style Transfer Convolution(STConv)convolutional kernel,based on the Visual Geometry Group(VGG)19 network,is introduced to improve feature extraction in style transfer.Additionally,an enhanced Transformer encoder is incorporated to capture contextual semantic information within images,while the CLIP model is employed for text data processing.A hybrid attention module is designed to precisely capture style features,achieving multimodal feature fusion via a diffusion model that generates exhibition hall design images aligned with stylistic requirements.Quantitative experiments show that compared with the most advanced algorithms,the proposed method has achieved significant performance improvement on both Fréchet Inception Distance(FID)and Kernel Inception Distance(KID)indexes.For example,on the ExpoArchive dataset,the proposed method has a FID value of 87.9 and a KID value of 1.98,which is significantly superior to other methods.展开更多
Fire detection has held stringent importance in computer vision for over half a century.The development of early fire detection strategies is pivotal to the realization of safe and smart cities,inhabitable in the futu...Fire detection has held stringent importance in computer vision for over half a century.The development of early fire detection strategies is pivotal to the realization of safe and smart cities,inhabitable in the future.However,the development of optimal fire and smoke detection models is hindered by limitations like publicly available datasets,lack of diversity,and class imbalance.In this work,we explore the possible ways forward to overcome these challenges posed by available datasets.We study the impact of a class-balanced dataset to improve the fire detection capability of state-of-the-art(SOTA)vision-based models and propose the use of generative models for data augmentation,as a future work direction.First,a comparative analysis of two prominent object detection architectures,You Only Look Once version 7(YOLOv7)and YOLOv8 has been carried out using a balanced dataset,where both models have been evaluated across various evaluation metrics including precision,recall,and mean Average Precision(mAP).The results are compared to other recent fire detection models,highlighting the superior performance and efficiency of the proposed YOLOv8 architecture as trained on our balanced dataset.Next,a fractal dimension analysis gives a deeper insight into the repetition of patterns in fire,and the effectiveness of the results has been demonstrated by a windowing-based inference approach.The proposed Slicing-Aided Hyper Inference(SAHI)improves the fire and smoke detection capability of YOLOv8 for real-life applications with a significantly improved mAP performance over a strict confidence threshold.YOLOv8 with SAHI inference gives a mAP:50-95 improvement of more than 25%compared to the base YOLOv8 model.The study also provides insights into future work direction by exploring the potential of generative models like deep convolutional generative adversarial network(DCGAN)and diffusion models like stable diffusion,for data augmentation.展开更多
基金funding from the European Commission by the Ruralities project(grant agreement no.101060876).
文摘In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic heterogeneous infrastructures,unstable links and non-uniform hardware capabilities create critical issues regarding security and privacy.Traditional protocols are often too computationally heavy to allow 6G services to achieve their expected Quality-of-Service(QoS).As the transport network is built of ad hoc nodes,there is no guarantee about their trustworthiness or behavior,and transversal functionalities are delegated to the extreme nodes.However,while security can be guaranteed in extreme-to-extreme solutions,privacy cannot,as all intermediate nodes still have to handle the data packets they are transporting.Besides,traditional schemes for private anonymous ad hoc communications are vulnerable against modern intelligent attacks based on learning models.The proposed scheme fulfills this gap.Findings show the probability of a successful intelligent attack reduces by up to 65%compared to ad hoc networks with no privacy protection strategy when used the proposed technology.While congestion probability can remain below 0.001%,as required in 6G services.
基金supported by the National Natural Science Foundation of China(32471964)。
文摘The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.
基金the National Natural Science Foundation of China(Grant No.:52508343)the Fundamental Research Funds for the Central Universities(Grant No.:B250201004).
文摘Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.
基金supported by the National Natural Science Foundation of China(Grant No.72161034).
文摘Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.
基金supported by the National Science and Technology Council,Taiwan,under grant no.NSTC 114-2221-E-197-005-MY3.
文摘With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard)problems such as the Traveling Salesman Problem(TSP).However,existing diffusion model-based solvers typically employ a fixed,uniform noise schedule(e.g.,linear or cosine annealing)across all training instances,failing to fully account for the unique characteristics of each problem instance.To address this challenge,we present GraphGuided Diffusion Solvers(GGDS),an enhanced method for improving graph-based diffusion models.GGDS leverages Graph Neural Networks(GNNs)to capture graph structural information embedded in node coordinates and adjacency matrices,dynamically adjusting the noise levels in the diffusion model.This study investigates the TSP by examining two distinct time-step noise generation strategies:cosine annealing and a Neural Network(NN)-based approach.We evaluate their performance across different problem scales,particularly after integrating graph structural information.Experimental results indicate that GGDS outperforms previous methods with average performance improvements of 18.7%,6.3%,and 88.7%on TSP-500,TSP-100,and TSP-50,respectively.Specifically,GGDS demonstrates superior performance on TSP-500 and TSP-50,while its performance on TSP-100 is either comparable to or slightly better than that of previous methods,depending on the chosen noise schedule and decoding strategy.
基金National Natural Science Foundation of China (50434030)
文摘Solute diffusion controlled solidification model was used to simulate the initial stage cellular to dendrite transition of Ti44Al alloys during directional solidification at different velocities. The simulation results show that during this process, a mixed structure composed of cells and dendrites was observed, where secondary dendrites are absent at facing surface with parallel closely spaced dendrites, which agrees with the previous experimental observation. The dendrite spacings are larger than cellular spacings at a given rate, and the columnar grain spacing sharply increases to a maximum as solidification advance to coexistence zone. In addition, simulation also revealed that decreasing the numbers of the seed causes the trend of unstable dendrite transition to increase. Finally, the main influence factors affecting cell/dendrite transition were analyzed, which could be the change of growth rates resulting in slight fluctuations of liquid composition occurred at growth front. The simulation results are in reasonable agreement with the results of previous theoretical models and experimental observation at low cooling rates.
基金supported by a grant from the National Natural Science Foundation of China (NSFC) No. 81070826/30872886/30400497Sponsored by Shanghai Rising-Star Program No. 09QA1403700+1 种基金funded by Shanghai Leading Academic Discipline Project (Project Number: S30206)the Science and Technology Commission of Shanghai (08DZ2271100)
文摘Aim The purpose of this study was to develop a mathe-matical model to quantitatively describe the passive trans-port of macromolecules within dental biofilms. Methodology Fluorescently labeled dextrans with different molecular mass (3 kD,10 kD,40 kD,70 kD,2 000 kD) were used as a series of diffusion probes. Streptococcus mutans,Streptococcus sanguinis,Actinomyces naeslundii and Fusobacterium nucleatum were used as inocula for biofilm formation. The diffusion processes of different probes through the in vitro biofilm were recorded with a confocal laser microscope. Results Mathematical function of biofilm penetration was constructed on the basis of the inverse problem method. Based on this function,not only the relationship between average concentration of steady-state and molecule weights can be analyzed,but also that between penetrative time and molecule weights. Conclusion This can be used to predict the effective concentration and the penetrative time of anti-biofilm medicines that can diffuse through oral biofilm. Further-more,an improved model for large molecule is proposed by considering the exchange time at the upper boundary of the dental biofilm.
基金Supported by the National Natural Science Foundation of China(51278147)the Funds for Creative Research Groups of China(51121062)State Key Laboratory of Urban Water Resource and Environment(Harbin Institute of Technology)(2014TS02)
文摘Polyvinylpyrrolidone K-30(PVP) was introduced into the preparation of nanozero-valent iron(n ZVI) and the traditional liquid-phase reduction was improved. The introduction of PVP simplified the traditional method.The n ZVI prepared with this new approach showed excellent surface characters and high performance on the removal of cadmium. TEM results showed that the aggregates of n ZVI can reach to several micrometers in length but less than 100 nm in diameter. The iron particles that were enclosed by a layer of oxide film that is less than10 nm, demonstrated that the n ZVI possesses a core–shell structure. BET results indicate that the specific surface area of the n ZVI was 20.3159 m^2g^(-1). A three factor and three level orthogonal experiment was employed to find out the dominant factor that affects the removal rate of cadmium by n ZVI. Based on the range values, the prominence order of each factor was: initial p H of the solution N initial concentration of cadmium N dosage of n ZVI, the range was 96.453, 3.294 and 1.747, respectively. A simulation was performed under the same condition and a same conclusion was derived, this consistence confirmed the validity of the conclusion that p H is the most significant factor that affects the adsorption efficiency.
文摘In this study,the removal of monovalent and divalent cations,Nat,Kt,Mg2t,and Ca2t,in a diluted solution from Chott-El Jerid Lake,Tunisia,was investigated with the electrodialysis technique.The process was tested using two cation-exchange membranes:sulfonated polyether sulfone cross-linked with 10%hexamethylenediamine(HEXCl)and sulfonated polyether sulfone grafted with octylamine(S-PESOS).The commercially available membrane Nafion®was used for comparison.The results showed that Nafion®and S-PESOS membranes had similar removal behaviors,and the investigated cations were ranked in the following descending order in terms of their demineralization rates:Nat>Ca2t>Mg2t>Kt.Divalent cations were more effectively removed by HEXCl than by monovalent cations.The plots based on the WebereMorris model showed a strong linearity.This reveals that intra-particle diffusion was not the removal rate-determining step,and the removal process was controlled by two or more concurrent mechanisms.The Boyd plots did not pass through their origin,and the sole controlling step was determined by film-diffusion resistance,especially after a long period of electrodialysis.Additionally,a semi-empirical model was established to simulate the temporal variation of the treatment process,and the physical significance and values of model parameters were compared for the three membranes.The findings of this study indicate that HEXCl and S-PESOS membranes can be efficiently utilized for water softening,especially when effluents are highly loaded with calcium and magnesium ions.
基金supported by National Natural Science Foundation of China(Nos.50725519,51271048,51321004)
文摘The inner surface modification process by plasma-based low-energy ion implantation(PBLEII)with an electron cyclotron resonance(ECR)microwave plasma source located at the central axis of a cylindrical tube is modeled to optimize the low-energy ion implantation parameters for industrial applications.In this paper,a magnetized plasma diffusion fluid model has been established to describe the plasma nonuniformity caused by plasma diffusion under an axial magnetic field during the pulse-off time of low pulsed negative bias.Using this plasma density distribution as the initial condition,a sheath collisional fluid model is built up to describe the sheath evolution and ion implantation during the pulse-on time.The plasma nonuniformity at the end of the pulse-off time is more apparent along the radial direction compared with that in the axial direction due to the geometry of the linear plasma source in the center and the difference between perpendicular and parallel plasma diffusion coefficients with respect to the magnetic field.The normalized nitrogen plasma densities on the inner and outer surfaces of the tube are observed to be about 0.39 and 0.24,respectively,of which the value is 1 at the central plasma source.After a 5μs pulse-on time,in the area less than 2 cm from the end of the tube,the nitrogen ion implantation energy decreases from 1.5 keV to 1.3 keV and the ion implantation angle increases from several degrees to more than 40°;both variations reduce the nitrogen ion implantation depth.However,the nitrogen ion implantation dose peaks of about 2×10^(10)-7×10^(10)ions/cm^2 in this area are 2-4 times higher than that of 1.18×10^(10)ions/cm^2 and 1.63×10^(10)ions/cm^2 on the inner and outer surfaces of the tube.The sufficient ion implantation dose ensures an acceptable modification effect near the end of the tube under the low energy and large angle conditions for nitrogen ion implantation,because the modification effect is mainly determined by the ion implantation dose,just as the mass transfer process in PBLEII is dominated by low-energy ion implantation and thermal diffusion.Therefore,a comparatively uniform surface modification by the low-energy nitrogen ion implantation is achieved along the cylindrical tube on both the inner and outer surfaces.
基金supported by the National Natural Science Foundation of China(Nos.61906168,62202429 and 62272267)the Zhejiang Provincial Natural Science Foundation of China(No.LY23F020023)the Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(No.2022SDSJ01)。
文摘Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.
基金partially supported by the National Natural Science Foundation of China(62271485)the SDHS Science and Technology Project(HS2023B044)
文摘Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.
基金supported by the Yonsei University graduate school Department of Integrative Biotechnology.
文摘Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited research on molecular sequence diffusion models.The International Union of Pure and Applied Chemistry(IUPAC)names are more akin to chemical natural language than the simplified molecular input line entry system(SMILES)for organic compounds.In this work,we apply an IUPAC-guided conditional diffusion model to facilitate molecular editing from chemical natural language to chemical language(SMILES)and explore whether the pre-trained generative performance of diffusion models can be transferred to chemical natural language.We propose DiffIUPAC,a controllable molecular editing diffusion model that converts IUPAC names to SMILES strings.Evaluation results demonstrate that our model out-performs existing methods and successfully captures the semantic rules of both chemical languages.Chemical space and scaffold analysis show that the model can generate similar compounds with diverse scaffolds within the specified constraints.Additionally,to illustrate the model’s applicability in drug design,we conducted case studies in functional group editing,analogue design and linker design.
基金supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2024GXJS014,ZDYF2023GXJS163)the National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)Collaborative Innovation Project of Hainan University(XTCX2022XXB02).
文摘With the rapid development of Internet of Things technology,the sharp increase in network devices and their inherent security vulnerabilities present a stark contrast,bringing unprecedented challenges to the field of network security,especially in identifying malicious attacks.However,due to the uneven distribution of network traffic data,particularly the imbalance between attack traffic and normal traffic,as well as the imbalance between minority class attacks and majority class attacks,traditional machine learning detection algorithms have significant limitations when dealing with sparse network traffic data.To effectively tackle this challenge,we have designed a lightweight intrusion detection model based on diffusion mechanisms,named Diff-IDS,with the core objective of enhancing the model’s efficiency in parsing complex network traffic features,thereby significantly improving its detection speed and training efficiency.The model begins by finely filtering network traffic features and converting them into grayscale images,while also employing image-flipping techniques for data augmentation.Subsequently,these preprocessed images are fed into a diffusion model based on the Unet architecture for training.Once the model is trained,we fix the weights of the Unet network and propose a feature enhancement algorithm based on feature masking to further boost the model’s expressiveness.Finally,we devise an end-to-end lightweight detection strategy to streamline the model,enabling efficient lightweight detection of imbalanced samples.Our method has been subjected to multiple experimental tests on renowned network intrusion detection benchmarks,including CICIDS 2017,KDD 99,and NSL-KDD.The experimental results indicate that Diff-IDS leads in terms of detection accuracy,training efficiency,and lightweight metrics compared to the current state-of-the-art models,demonstrating exceptional detection capabilities and robustness.
基金co-supported by the National Natural Science Foundation of China(Nos.61806219,61876189 and 61703426)the Young Talent Fund of University Association for Science and Technology in Shaanxi,China(Nos.20190108 and 20220106)the Innvation Talent Supporting Project of Shaanxi,China(No.2020KJXX-065)。
文摘Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
基金sponsored by Tsinghua-Toyota Joint Research Fund
文摘Deep learning has achieved great progress in image recognition,segmentation,semantic recognition and game theory.In this study,a latest deep learning model,a conditional diffusion model was adopted as a surrogate model to predict the heat transfer during the casting process instead of numerical simulation.The conditional diffusion model was established and trained with the geometry shapes,initial temperature fields and temperature fields at t_(i) as the condition and random noise sampled from standard normal distribution as the input.The output was the temperature field at t_(i+1).Therefore,the temperature field at t_(i+1)can be predicted as the temperature field at t_(i) is known,and the continuous temperature fields of all the time steps can be predicted based on the initial temperature field of an arbitrary 2D geometry.A training set with 3022D shapes and their simulated temperature fields at different time steps was established.The accuracy for the temperature field for a single time step reaches 97.7%,and that for continuous time steps reaches 69.1%with the main error actually existing in the sand mold.The effect of geometry shape and initial temperature field on the prediction accuracy was investigated,the former achieves better result than the latter because the former can identify casting,mold and chill by different colors in the input images.The diffusion model has proved the potential as a surrogate model for numerical simulation of the casting process.
文摘Although existing style transfer techniques have made significant progress in the field of image generation,there are still some challenges in the field of exhibition hall design.The existing style transfer methods mainly focus on the transformation of single dimensional features,but ignore the deep integration of content and style features in exhibition hall design.In addition,existing methods are deficient in detail retention,especially in accurately capturing and reproducing local textures and details while preserving the content image structure.In addition,point-based attention mechanisms tend to ignore the complexity and diversity of image features in multi-dimensional space,resulting in alignment problems between features in different semantic areas,resulting in inconsistent stylistic features in content areas.In this context,this paper proposes a semantic-enhanced multimodal style transfer algorithm tailored for exhibition hall design.The proposed approach leverages a multimodal encoder architecture to integrate information from text,source images,and style images,using separate encoder modules for each modality to capture shallow,deep,and semantic features.A novel Style Transfer Convolution(STConv)convolutional kernel,based on the Visual Geometry Group(VGG)19 network,is introduced to improve feature extraction in style transfer.Additionally,an enhanced Transformer encoder is incorporated to capture contextual semantic information within images,while the CLIP model is employed for text data processing.A hybrid attention module is designed to precisely capture style features,achieving multimodal feature fusion via a diffusion model that generates exhibition hall design images aligned with stylistic requirements.Quantitative experiments show that compared with the most advanced algorithms,the proposed method has achieved significant performance improvement on both Fréchet Inception Distance(FID)and Kernel Inception Distance(KID)indexes.For example,on the ExpoArchive dataset,the proposed method has a FID value of 87.9 and a KID value of 1.98,which is significantly superior to other methods.
基金supported by a grant from R&D Program Development of Rail-Specific Digital Resource Technology Based on an AI-Enabled Rail Support Platform,grant number PK2401C1,of the Korea Railroad Research Institute.
文摘Fire detection has held stringent importance in computer vision for over half a century.The development of early fire detection strategies is pivotal to the realization of safe and smart cities,inhabitable in the future.However,the development of optimal fire and smoke detection models is hindered by limitations like publicly available datasets,lack of diversity,and class imbalance.In this work,we explore the possible ways forward to overcome these challenges posed by available datasets.We study the impact of a class-balanced dataset to improve the fire detection capability of state-of-the-art(SOTA)vision-based models and propose the use of generative models for data augmentation,as a future work direction.First,a comparative analysis of two prominent object detection architectures,You Only Look Once version 7(YOLOv7)and YOLOv8 has been carried out using a balanced dataset,where both models have been evaluated across various evaluation metrics including precision,recall,and mean Average Precision(mAP).The results are compared to other recent fire detection models,highlighting the superior performance and efficiency of the proposed YOLOv8 architecture as trained on our balanced dataset.Next,a fractal dimension analysis gives a deeper insight into the repetition of patterns in fire,and the effectiveness of the results has been demonstrated by a windowing-based inference approach.The proposed Slicing-Aided Hyper Inference(SAHI)improves the fire and smoke detection capability of YOLOv8 for real-life applications with a significantly improved mAP performance over a strict confidence threshold.YOLOv8 with SAHI inference gives a mAP:50-95 improvement of more than 25%compared to the base YOLOv8 model.The study also provides insights into future work direction by exploring the potential of generative models like deep convolutional generative adversarial network(DCGAN)and diffusion models like stable diffusion,for data augmentation.