The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-super...The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.展开更多
In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic h...In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic heterogeneous infrastructures,unstable links and non-uniform hardware capabilities create critical issues regarding security and privacy.Traditional protocols are often too computationally heavy to allow 6G services to achieve their expected Quality-of-Service(QoS).As the transport network is built of ad hoc nodes,there is no guarantee about their trustworthiness or behavior,and transversal functionalities are delegated to the extreme nodes.However,while security can be guaranteed in extreme-to-extreme solutions,privacy cannot,as all intermediate nodes still have to handle the data packets they are transporting.Besides,traditional schemes for private anonymous ad hoc communications are vulnerable against modern intelligent attacks based on learning models.The proposed scheme fulfills this gap.Findings show the probability of a successful intelligent attack reduces by up to 65%compared to ad hoc networks with no privacy protection strategy when used the proposed technology.While congestion probability can remain below 0.001%,as required in 6G services.展开更多
Some patients with systemic lupus erythematosus experience neuropsychiatric symptoms.Although magnetic resonance imaging can detect abnormal signals in the white matter of the brain,conventional methods often struggle...Some patients with systemic lupus erythematosus experience neuropsychiatric symptoms.Although magnetic resonance imaging can detect abnormal signals in the white matter of the brain,conventional methods often struggle to accurately capture microstructural changes.Various diffusion models have been used to study white matter in systemic lupus erythematosus;however,comparative analyses of their sensitivity and specificity for detecting microstructural changes remain insufficient.To address this,our team designed a diagnostic trial that used multimodal diffusion imaging techniques to observe white matter microstructural changes in patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with an aim to identify key diagnostic biomarkers for these patients.Patients with active lupus who received treatment at the Department of Rheumatology and Immunology,The First Affiliated Hospital of China Medical University,from September 2023 to March 2024 were recruited.According to the standards of the American College of Rheumatology,patients with systemic lupus erythematosus who had neuropsychiatric symptoms were assigned to the systemic lupus erythematosus group,whereas those without neuropsychiatric symptoms were assigned to the non-systemic lupus erythematosus group.Additionally,healthy volunteers matched by region,sex,and age were recruited as controls.All three groups underwent the same diffusion magnetic resonance imaging examination protocol to compare differences in diffusion parameters.Advanced diffusion imaging models were able to sensitively detect microstructural changes in the white matter fibers of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with specific diffusion parameters showing significant abnormalities in key brain regions.In the left superior longitudinal fasciculus subregion and the right thalamic radiations of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,we also identified abnormal diffusion characteristics that were clearly correlated with disease activity,suggesting that microstructural changes in these areas may reflect the dynamic process of neuroinflammatory damage.The present study addresses critical challenges in the diagnosis of systemic lupus erythematosus by identifying specific white matter imaging biomarkers and elucidating the association between microstructural damage and clinical manifestations.The main contributions of our study include:1)establishing axial regression probability parameters from mean apparent propagator magnetic resonance imaging as sensitive biomarkers for systemic lupus erythematosus,particularly in the third subregion of the left superior longitudinal fasciculus;2)demonstrating that multimodal diffusion imaging may be superior to conventional diffusion tensor imaging for detecting white matter microstructural abnormalities in patients with systemic lupus erythematosus;and 3)integrating tract-based spatial statistics with clinically relevant analyses to link imaging findings to pathological mechanisms.展开更多
Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse de...Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.展开更多
Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges...Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.展开更多
The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numer...Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numerical simulation,which are costly and time-consuming.Therefore,it is necessary to obtain unsteady hydrodynamic performance in a low-cost and high-precision manner.Due to the strong nonlinearity,complex data distribution,and temporal characteristics of unsteady hydrodynamic performance,the prediction of it is challenging.This paper proposes a Temporal Convolutional Diffusion Model(TCDM)for predicting the unsteady hydrodynamic performance of seaplanes given design parameters.Under the framework of a classifier-free guided diffusion model,TCDM learns the distribution patterns of unsteady hydrodynamic performance data with the designed denoising module based on temporal convolutional network and captures the temporal features of unsteady hydrodynamic performance data.Using CFD simulation data,the proposed method is compared with the alternative methods to demonstrate its accuracy and generalization.This paper provides a method that enables the rapid and accurate prediction of unsteady hydrodynamic performance data,expecting to shorten the design cycle of seaplanes.展开更多
Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deplo...Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.展开更多
High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is...High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is highly resource intensive.This paper proposes a FlowViT-Diff framework that integrates a Vision Transformer(ViT)with an enhanced denoising diffusion probabilistic model for the Super-Resolution(SR)reconstruction of HR flow fields based on low-resolution inputs.It provides a quick initial prediction of the HR flow field by optimizing the ViT architecture,and incorporates this preliminary output as guidance within an enhanced diffusion model.The latter captures the Gaussian noise distribution during forward diffusion and progressively removes it during backward diffusion to generate the flow field.Experiments on various supercritical airfoils under different flow conditions show that FlowViT-Diff can robustly reconstruct the flow field across multiple levels of downsampling.It obtains more consistent global and local features than traditional SR methods,and yields a 3.6-fold increase in its training speed via transfer learning.Its accuracy of reconstruction of the flow field is 99.7%under ultra-low downsampling.The results demonstrate that Flow Vi T-Diff not only exhibits effective flow field reconstruction capabilities,but also provides two reconstruction strategies,both of which show effective transferability.展开更多
AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,com...AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.展开更多
Digital rock analysis(DRA)is fundamental for geo-energy research,enabling the characterisation of microstructures for applications like hydrocarbon recovery,carbon storage,and groundwater modelling.Although 2D CT imag...Digital rock analysis(DRA)is fundamental for geo-energy research,enabling the characterisation of microstructures for applications like hydrocarbon recovery,carbon storage,and groundwater modelling.Although 2D CT images provide valuable pore-scale data,the scarcity of real-world datasets limits the effectiveness of advanced analysis.Generative AI presents a promising approach for synthesizing high-quality rock images but faces key challenges,including high computational demands,insufficient evaluation metrics,and the trade-off between image fidelity and diversity.To address these limitations,this study proposes the use of Low-Rank Adaptation(LoRA)for fine-tuning stable diffusion models,significantly reducing computational requirements while maintaining image quality.A systematic investigation was conducted to evaluate the influence of LoRA training parameters,including rank and learning rate,on the quality of generated images.Image outputs were assessed using both standard generative metrics,such as Kernel Inception Distance(KID),and domain-specific metrics,including porosity,pore count,and pore area distributions.The optimised LoRA-enhanced diffusion model achieved a 92.6% reduction in KID relative to baseline models,while also improving inference speed.Building on these advancements,this study demonstrates that the LoRA-enhanced diffusion model significantly improves neural network extrapolation in incomplete data scenarios through statistically consistent synthetic generation.Despite control challenges,this approach reduces costs and enables diverse applications,bridging fundamental rock physics with practical energy research.展开更多
Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frame...Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.展开更多
Scalable simulation leveraging real-world data plays an essential role in advancing autonomous driving,owing to its efficiency and applicability in both training and evaluating algorithms.Consequently,there has been i...Scalable simulation leveraging real-world data plays an essential role in advancing autonomous driving,owing to its efficiency and applicability in both training and evaluating algorithms.Consequently,there has been increasing attention on generating highly realistic and consistent driving videos,particularly those involving viewpoint changes guided by the control commands or trajectories of ego vehicles.However,current reconstruction approaches,such as Neural Radiance Fields and 3D Gaussian Splatting,frequently suffer from limited generalization and depend on substantial input data.Meanwhile,2D generative models,though capable of producing unknown scenes,still have room for improvement in terms of coherence and visual realism.To overcome these challenges,we introduce GenScene,a world model that synthesizes front-view driving videos conditioned on trajectories.A new temporal module is presented to improve video consistency by extracting the global context of each frame,calculating relationships of frames using these global representations,and fusing frame contexts accordingly.Moreover,we propose an innovative attention mechanism that computes relations of pixels within each frame and pixels in the corresponding window range of the initial frame.Extensive experiments show that our approach surpasses various state-of-the-art models in driving video generation,and the introduced modules contribute significantly to model performance.This work establishes a new paradigm for goal-oriented video synthesis in autonomous driving,which facilitates on-demand simulation to expedite algorithm development.展开更多
Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and ...Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.展开更多
With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard...With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard)problems such as the Traveling Salesman Problem(TSP).However,existing diffusion model-based solvers typically employ a fixed,uniform noise schedule(e.g.,linear or cosine annealing)across all training instances,failing to fully account for the unique characteristics of each problem instance.To address this challenge,we present GraphGuided Diffusion Solvers(GGDS),an enhanced method for improving graph-based diffusion models.GGDS leverages Graph Neural Networks(GNNs)to capture graph structural information embedded in node coordinates and adjacency matrices,dynamically adjusting the noise levels in the diffusion model.This study investigates the TSP by examining two distinct time-step noise generation strategies:cosine annealing and a Neural Network(NN)-based approach.We evaluate their performance across different problem scales,particularly after integrating graph structural information.Experimental results indicate that GGDS outperforms previous methods with average performance improvements of 18.7%,6.3%,and 88.7%on TSP-500,TSP-100,and TSP-50,respectively.Specifically,GGDS demonstrates superior performance on TSP-500 and TSP-50,while its performance on TSP-100 is either comparable to or slightly better than that of previous methods,depending on the chosen noise schedule and decoding strategy.展开更多
In this paper,we are concerned with the stability of traveling wavefronts of a Belousov-Zhabotinsky model with mixed nonlocal and degenerate diffusions.Such a system can be used to study the competition among nonlocal...In this paper,we are concerned with the stability of traveling wavefronts of a Belousov-Zhabotinsky model with mixed nonlocal and degenerate diffusions.Such a system can be used to study the competition among nonlocally diffusive species and degenerately diffusive species.We prove that the traveling wavefronts are exponentially stable,when the initial perturbation around the traveling waves decays exponentially as x→-∞,but in other locations,the initial data can be arbitrarily large.The adopted methods are the weighted energy with the comparison principle and squeezing technique.展开更多
Rural domestic sewage treatment is critical for environmental protection.This study defines the spatial pattern of villages from the perspective of rural sewage treatment and develops an integrated decision-making sys...Rural domestic sewage treatment is critical for environmental protection.This study defines the spatial pattern of villages from the perspective of rural sewage treatment and develops an integrated decision-making system to propose a sewage treatment mode and scheme suitable for local conditions.By considering the village spatial layout and terrain factors,a decision tree model of residential density and terrain type was constructed with accuracies of 76.47%and 96.00%,respectively.Combined with binary classification probability unit regression,an appropriate sewage treatment mode for the village was determined with 87.00%accuracy.The Analytic Hierarchy Process(AHP),combined with the Technique for Order Preference(TOPSIS)by Similarity to an Ideal Solution model,formed the basis for optimal treatment process selection under different emission standards.Verification was conducted in 542 villages across three counties of the Inner Mongolia Autonomous Region,focusing on the standard effluent effect(0.3773),low investment cost(0.3196),and high standard effluent effect(0.5115)to determine the best treatment process for the same emission standard under different needs.The annual environmental and carbon emission benefits of sewage treatment in these villages were estimated.This model matches village density,geographic feature,and social development level,and provides scientific support and a theoretical basis for rural sewage treatment decision-making.展开更多
基金supported by the National Natural Science Foundation of China(32471964)。
文摘The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.
基金funding from the European Commission by the Ruralities project(grant agreement no.101060876).
文摘In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic heterogeneous infrastructures,unstable links and non-uniform hardware capabilities create critical issues regarding security and privacy.Traditional protocols are often too computationally heavy to allow 6G services to achieve their expected Quality-of-Service(QoS).As the transport network is built of ad hoc nodes,there is no guarantee about their trustworthiness or behavior,and transversal functionalities are delegated to the extreme nodes.However,while security can be guaranteed in extreme-to-extreme solutions,privacy cannot,as all intermediate nodes still have to handle the data packets they are transporting.Besides,traditional schemes for private anonymous ad hoc communications are vulnerable against modern intelligent attacks based on learning models.The proposed scheme fulfills this gap.Findings show the probability of a successful intelligent attack reduces by up to 65%compared to ad hoc networks with no privacy protection strategy when used the proposed technology.While congestion probability can remain below 0.001%,as required in 6G services.
基金supported by the National Natural Science Foundation Joint Fund,No.U22A20309(to PY)the Natural Science Foundation of LiaoningProvince,No.2023-MS-07(to HuL)the Unveiling Key Scientific and Technological Projects of Liaoning Province,No.2021JH1/10400051(to HuL).
文摘Some patients with systemic lupus erythematosus experience neuropsychiatric symptoms.Although magnetic resonance imaging can detect abnormal signals in the white matter of the brain,conventional methods often struggle to accurately capture microstructural changes.Various diffusion models have been used to study white matter in systemic lupus erythematosus;however,comparative analyses of their sensitivity and specificity for detecting microstructural changes remain insufficient.To address this,our team designed a diagnostic trial that used multimodal diffusion imaging techniques to observe white matter microstructural changes in patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with an aim to identify key diagnostic biomarkers for these patients.Patients with active lupus who received treatment at the Department of Rheumatology and Immunology,The First Affiliated Hospital of China Medical University,from September 2023 to March 2024 were recruited.According to the standards of the American College of Rheumatology,patients with systemic lupus erythematosus who had neuropsychiatric symptoms were assigned to the systemic lupus erythematosus group,whereas those without neuropsychiatric symptoms were assigned to the non-systemic lupus erythematosus group.Additionally,healthy volunteers matched by region,sex,and age were recruited as controls.All three groups underwent the same diffusion magnetic resonance imaging examination protocol to compare differences in diffusion parameters.Advanced diffusion imaging models were able to sensitively detect microstructural changes in the white matter fibers of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with specific diffusion parameters showing significant abnormalities in key brain regions.In the left superior longitudinal fasciculus subregion and the right thalamic radiations of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,we also identified abnormal diffusion characteristics that were clearly correlated with disease activity,suggesting that microstructural changes in these areas may reflect the dynamic process of neuroinflammatory damage.The present study addresses critical challenges in the diagnosis of systemic lupus erythematosus by identifying specific white matter imaging biomarkers and elucidating the association between microstructural damage and clinical manifestations.The main contributions of our study include:1)establishing axial regression probability parameters from mean apparent propagator magnetic resonance imaging as sensitive biomarkers for systemic lupus erythematosus,particularly in the third subregion of the left superior longitudinal fasciculus;2)demonstrating that multimodal diffusion imaging may be superior to conventional diffusion tensor imaging for detecting white matter microstructural abnormalities in patients with systemic lupus erythematosus;and 3)integrating tract-based spatial statistics with clinically relevant analyses to link imaging findings to pathological mechanisms.
基金supported by the National Natural Science Foundation of China(Nos.61906168,62202429 and 62272267)the Zhejiang Provincial Natural Science Foundation of China(No.LY23F020023)the Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(No.2022SDSJ01)。
文摘Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.
基金co-supported by the National Natural Science Foundation of China(Nos.61806219,61876189 and 61703426)the Young Talent Fund of University Association for Science and Technology in Shaanxi,China(Nos.20190108 and 20220106)the Innvation Talent Supporting Project of Shaanxi,China(No.2020KJXX-065)。
文摘Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
基金supported by the Aeronautical Science Foundation of China(Nos.2018ZA52002,2019ZA052011)the National Natural Science Foundation of China(No.12472236).
文摘Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numerical simulation,which are costly and time-consuming.Therefore,it is necessary to obtain unsteady hydrodynamic performance in a low-cost and high-precision manner.Due to the strong nonlinearity,complex data distribution,and temporal characteristics of unsteady hydrodynamic performance,the prediction of it is challenging.This paper proposes a Temporal Convolutional Diffusion Model(TCDM)for predicting the unsteady hydrodynamic performance of seaplanes given design parameters.Under the framework of a classifier-free guided diffusion model,TCDM learns the distribution patterns of unsteady hydrodynamic performance data with the designed denoising module based on temporal convolutional network and captures the temporal features of unsteady hydrodynamic performance data.Using CFD simulation data,the proposed method is compared with the alternative methods to demonstrate its accuracy and generalization.This paper provides a method that enables the rapid and accurate prediction of unsteady hydrodynamic performance data,expecting to shorten the design cycle of seaplanes.
基金supported in part by the National Natural Science Foundation of China under Grants 62102450,62272478 and the Independent Research Project of a Certain Unit under Grant ZZKY20243127。
文摘Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.
基金supported by the National Natural Science Foundation of China(No.12472265)。
文摘High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is highly resource intensive.This paper proposes a FlowViT-Diff framework that integrates a Vision Transformer(ViT)with an enhanced denoising diffusion probabilistic model for the Super-Resolution(SR)reconstruction of HR flow fields based on low-resolution inputs.It provides a quick initial prediction of the HR flow field by optimizing the ViT architecture,and incorporates this preliminary output as guidance within an enhanced diffusion model.The latter captures the Gaussian noise distribution during forward diffusion and progressively removes it during backward diffusion to generate the flow field.Experiments on various supercritical airfoils under different flow conditions show that FlowViT-Diff can robustly reconstruct the flow field across multiple levels of downsampling.It obtains more consistent global and local features than traditional SR methods,and yields a 3.6-fold increase in its training speed via transfer learning.Its accuracy of reconstruction of the flow field is 99.7%under ultra-low downsampling.The results demonstrate that Flow Vi T-Diff not only exhibits effective flow field reconstruction capabilities,but also provides two reconstruction strategies,both of which show effective transferability.
基金supported by the Key Project of International Cooperation of Qilu University of Technology(Grant No.:QLUTGJHZ2018008)Shandong Provincial Natural Science Foundation Committee,China(Grant No.:ZR2016HB54)Shandong Provincial Key Laboratory of Microbial Engineering(SME).
文摘AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.
基金funding from Innovate UK(reference number:10003208)the China Scholarship Council(Grant No.CSC 202408420030).
文摘Digital rock analysis(DRA)is fundamental for geo-energy research,enabling the characterisation of microstructures for applications like hydrocarbon recovery,carbon storage,and groundwater modelling.Although 2D CT images provide valuable pore-scale data,the scarcity of real-world datasets limits the effectiveness of advanced analysis.Generative AI presents a promising approach for synthesizing high-quality rock images but faces key challenges,including high computational demands,insufficient evaluation metrics,and the trade-off between image fidelity and diversity.To address these limitations,this study proposes the use of Low-Rank Adaptation(LoRA)for fine-tuning stable diffusion models,significantly reducing computational requirements while maintaining image quality.A systematic investigation was conducted to evaluate the influence of LoRA training parameters,including rank and learning rate,on the quality of generated images.Image outputs were assessed using both standard generative metrics,such as Kernel Inception Distance(KID),and domain-specific metrics,including porosity,pore count,and pore area distributions.The optimised LoRA-enhanced diffusion model achieved a 92.6% reduction in KID relative to baseline models,while also improving inference speed.Building on these advancements,this study demonstrates that the LoRA-enhanced diffusion model significantly improves neural network extrapolation in incomplete data scenarios through statistically consistent synthetic generation.Despite control challenges,this approach reduces costs and enables diverse applications,bridging fundamental rock physics with practical energy research.
基金supported by the National Natural Science Foundation of China(Grant No.72161034).
文摘Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.
基金supported by the Cultivation Program for Major Scientific Research Projects of Harbin Institute of Technology(ZDXMPY20180109).
文摘Scalable simulation leveraging real-world data plays an essential role in advancing autonomous driving,owing to its efficiency and applicability in both training and evaluating algorithms.Consequently,there has been increasing attention on generating highly realistic and consistent driving videos,particularly those involving viewpoint changes guided by the control commands or trajectories of ego vehicles.However,current reconstruction approaches,such as Neural Radiance Fields and 3D Gaussian Splatting,frequently suffer from limited generalization and depend on substantial input data.Meanwhile,2D generative models,though capable of producing unknown scenes,still have room for improvement in terms of coherence and visual realism.To overcome these challenges,we introduce GenScene,a world model that synthesizes front-view driving videos conditioned on trajectories.A new temporal module is presented to improve video consistency by extracting the global context of each frame,calculating relationships of frames using these global representations,and fusing frame contexts accordingly.Moreover,we propose an innovative attention mechanism that computes relations of pixels within each frame and pixels in the corresponding window range of the initial frame.Extensive experiments show that our approach surpasses various state-of-the-art models in driving video generation,and the introduced modules contribute significantly to model performance.This work establishes a new paradigm for goal-oriented video synthesis in autonomous driving,which facilitates on-demand simulation to expedite algorithm development.
基金the National Natural Science Foundation of China(Grant No.:52508343)the Fundamental Research Funds for the Central Universities(Grant No.:B250201004).
文摘Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.
基金supported by the National Science and Technology Council,Taiwan,under grant no.NSTC 114-2221-E-197-005-MY3.
文摘With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard)problems such as the Traveling Salesman Problem(TSP).However,existing diffusion model-based solvers typically employ a fixed,uniform noise schedule(e.g.,linear or cosine annealing)across all training instances,failing to fully account for the unique characteristics of each problem instance.To address this challenge,we present GraphGuided Diffusion Solvers(GGDS),an enhanced method for improving graph-based diffusion models.GGDS leverages Graph Neural Networks(GNNs)to capture graph structural information embedded in node coordinates and adjacency matrices,dynamically adjusting the noise levels in the diffusion model.This study investigates the TSP by examining two distinct time-step noise generation strategies:cosine annealing and a Neural Network(NN)-based approach.We evaluate their performance across different problem scales,particularly after integrating graph structural information.Experimental results indicate that GGDS outperforms previous methods with average performance improvements of 18.7%,6.3%,and 88.7%on TSP-500,TSP-100,and TSP-50,respectively.Specifically,GGDS demonstrates superior performance on TSP-500 and TSP-50,while its performance on TSP-100 is either comparable to or slightly better than that of previous methods,depending on the chosen noise schedule and decoding strategy.
基金Supported by the National Natural Science Foundation of China(Grant No.12261081).
文摘In this paper,we are concerned with the stability of traveling wavefronts of a Belousov-Zhabotinsky model with mixed nonlocal and degenerate diffusions.Such a system can be used to study the competition among nonlocally diffusive species and degenerately diffusive species.We prove that the traveling wavefronts are exponentially stable,when the initial perturbation around the traveling waves decays exponentially as x→-∞,but in other locations,the initial data can be arbitrarily large.The adopted methods are the weighted energy with the comparison principle and squeezing technique.
基金supported by the Central Government Guiding Local Science and Technology Development Fund Project(No.2024SZY0343)the Joint Research Program for Ecological Conservation and High Quality Development of the Yellow River Basin(No.2022-YRUC-01-050205)+2 种基金the Higher Education Scientific Research Project of Inner Mongolia Autonomous Region(No.NJZZ23078)the project of Inner Mongolia"Prairie Talents"Engineering Innovation Entrepreneurship Talent Team,the Major Projects of Erdos Science and Technology(No.2022EEDSKJZDZX015)the Innovation Team of the Inner Mongolia Academy of Science and Technology(No.CXTD2023-01-016).
文摘Rural domestic sewage treatment is critical for environmental protection.This study defines the spatial pattern of villages from the perspective of rural sewage treatment and develops an integrated decision-making system to propose a sewage treatment mode and scheme suitable for local conditions.By considering the village spatial layout and terrain factors,a decision tree model of residential density and terrain type was constructed with accuracies of 76.47%and 96.00%,respectively.Combined with binary classification probability unit regression,an appropriate sewage treatment mode for the village was determined with 87.00%accuracy.The Analytic Hierarchy Process(AHP),combined with the Technique for Order Preference(TOPSIS)by Similarity to an Ideal Solution model,formed the basis for optimal treatment process selection under different emission standards.Verification was conducted in 542 villages across three counties of the Inner Mongolia Autonomous Region,focusing on the standard effluent effect(0.3773),low investment cost(0.3196),and high standard effluent effect(0.5115)to determine the best treatment process for the same emission standard under different needs.The annual environmental and carbon emission benefits of sewage treatment in these villages were estimated.This model matches village density,geographic feature,and social development level,and provides scientific support and a theoretical basis for rural sewage treatment decision-making.