In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic h...In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic heterogeneous infrastructures,unstable links and non-uniform hardware capabilities create critical issues regarding security and privacy.Traditional protocols are often too computationally heavy to allow 6G services to achieve their expected Quality-of-Service(QoS).As the transport network is built of ad hoc nodes,there is no guarantee about their trustworthiness or behavior,and transversal functionalities are delegated to the extreme nodes.However,while security can be guaranteed in extreme-to-extreme solutions,privacy cannot,as all intermediate nodes still have to handle the data packets they are transporting.Besides,traditional schemes for private anonymous ad hoc communications are vulnerable against modern intelligent attacks based on learning models.The proposed scheme fulfills this gap.Findings show the probability of a successful intelligent attack reduces by up to 65%compared to ad hoc networks with no privacy protection strategy when used the proposed technology.While congestion probability can remain below 0.001%,as required in 6G services.展开更多
Some patients with systemic lupus erythematosus experience neuropsychiatric symptoms.Although magnetic resonance imaging can detect abnormal signals in the white matter of the brain,conventional methods often struggle...Some patients with systemic lupus erythematosus experience neuropsychiatric symptoms.Although magnetic resonance imaging can detect abnormal signals in the white matter of the brain,conventional methods often struggle to accurately capture microstructural changes.Various diffusion models have been used to study white matter in systemic lupus erythematosus;however,comparative analyses of their sensitivity and specificity for detecting microstructural changes remain insufficient.To address this,our team designed a diagnostic trial that used multimodal diffusion imaging techniques to observe white matter microstructural changes in patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with an aim to identify key diagnostic biomarkers for these patients.Patients with active lupus who received treatment at the Department of Rheumatology and Immunology,The First Affiliated Hospital of China Medical University,from September 2023 to March 2024 were recruited.According to the standards of the American College of Rheumatology,patients with systemic lupus erythematosus who had neuropsychiatric symptoms were assigned to the systemic lupus erythematosus group,whereas those without neuropsychiatric symptoms were assigned to the non-systemic lupus erythematosus group.Additionally,healthy volunteers matched by region,sex,and age were recruited as controls.All three groups underwent the same diffusion magnetic resonance imaging examination protocol to compare differences in diffusion parameters.Advanced diffusion imaging models were able to sensitively detect microstructural changes in the white matter fibers of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with specific diffusion parameters showing significant abnormalities in key brain regions.In the left superior longitudinal fasciculus subregion and the right thalamic radiations of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,we also identified abnormal diffusion characteristics that were clearly correlated with disease activity,suggesting that microstructural changes in these areas may reflect the dynamic process of neuroinflammatory damage.The present study addresses critical challenges in the diagnosis of systemic lupus erythematosus by identifying specific white matter imaging biomarkers and elucidating the association between microstructural damage and clinical manifestations.The main contributions of our study include:1)establishing axial regression probability parameters from mean apparent propagator magnetic resonance imaging as sensitive biomarkers for systemic lupus erythematosus,particularly in the third subregion of the left superior longitudinal fasciculus;2)demonstrating that multimodal diffusion imaging may be superior to conventional diffusion tensor imaging for detecting white matter microstructural abnormalities in patients with systemic lupus erythematosus;and 3)integrating tract-based spatial statistics with clinically relevant analyses to link imaging findings to pathological mechanisms.展开更多
Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse de...Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.展开更多
Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges...Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.展开更多
The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numer...Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numerical simulation,which are costly and time-consuming.Therefore,it is necessary to obtain unsteady hydrodynamic performance in a low-cost and high-precision manner.Due to the strong nonlinearity,complex data distribution,and temporal characteristics of unsteady hydrodynamic performance,the prediction of it is challenging.This paper proposes a Temporal Convolutional Diffusion Model(TCDM)for predicting the unsteady hydrodynamic performance of seaplanes given design parameters.Under the framework of a classifier-free guided diffusion model,TCDM learns the distribution patterns of unsteady hydrodynamic performance data with the designed denoising module based on temporal convolutional network and captures the temporal features of unsteady hydrodynamic performance data.Using CFD simulation data,the proposed method is compared with the alternative methods to demonstrate its accuracy and generalization.This paper provides a method that enables the rapid and accurate prediction of unsteady hydrodynamic performance data,expecting to shorten the design cycle of seaplanes.展开更多
Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deplo...Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.展开更多
High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is...High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is highly resource intensive.This paper proposes a FlowViT-Diff framework that integrates a Vision Transformer(ViT)with an enhanced denoising diffusion probabilistic model for the Super-Resolution(SR)reconstruction of HR flow fields based on low-resolution inputs.It provides a quick initial prediction of the HR flow field by optimizing the ViT architecture,and incorporates this preliminary output as guidance within an enhanced diffusion model.The latter captures the Gaussian noise distribution during forward diffusion and progressively removes it during backward diffusion to generate the flow field.Experiments on various supercritical airfoils under different flow conditions show that FlowViT-Diff can robustly reconstruct the flow field across multiple levels of downsampling.It obtains more consistent global and local features than traditional SR methods,and yields a 3.6-fold increase in its training speed via transfer learning.Its accuracy of reconstruction of the flow field is 99.7%under ultra-low downsampling.The results demonstrate that Flow Vi T-Diff not only exhibits effective flow field reconstruction capabilities,but also provides two reconstruction strategies,both of which show effective transferability.展开更多
AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,com...AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.展开更多
Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frame...Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.展开更多
Scalable simulation leveraging real-world data plays an essential role in advancing autonomous driving,owing to its efficiency and applicability in both training and evaluating algorithms.Consequently,there has been i...Scalable simulation leveraging real-world data plays an essential role in advancing autonomous driving,owing to its efficiency and applicability in both training and evaluating algorithms.Consequently,there has been increasing attention on generating highly realistic and consistent driving videos,particularly those involving viewpoint changes guided by the control commands or trajectories of ego vehicles.However,current reconstruction approaches,such as Neural Radiance Fields and 3D Gaussian Splatting,frequently suffer from limited generalization and depend on substantial input data.Meanwhile,2D generative models,though capable of producing unknown scenes,still have room for improvement in terms of coherence and visual realism.To overcome these challenges,we introduce GenScene,a world model that synthesizes front-view driving videos conditioned on trajectories.A new temporal module is presented to improve video consistency by extracting the global context of each frame,calculating relationships of frames using these global representations,and fusing frame contexts accordingly.Moreover,we propose an innovative attention mechanism that computes relations of pixels within each frame and pixels in the corresponding window range of the initial frame.Extensive experiments show that our approach surpasses various state-of-the-art models in driving video generation,and the introduced modules contribute significantly to model performance.This work establishes a new paradigm for goal-oriented video synthesis in autonomous driving,which facilitates on-demand simulation to expedite algorithm development.展开更多
Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and ...Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.展开更多
With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard...With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard)problems such as the Traveling Salesman Problem(TSP).However,existing diffusion model-based solvers typically employ a fixed,uniform noise schedule(e.g.,linear or cosine annealing)across all training instances,failing to fully account for the unique characteristics of each problem instance.To address this challenge,we present GraphGuided Diffusion Solvers(GGDS),an enhanced method for improving graph-based diffusion models.GGDS leverages Graph Neural Networks(GNNs)to capture graph structural information embedded in node coordinates and adjacency matrices,dynamically adjusting the noise levels in the diffusion model.This study investigates the TSP by examining two distinct time-step noise generation strategies:cosine annealing and a Neural Network(NN)-based approach.We evaluate their performance across different problem scales,particularly after integrating graph structural information.Experimental results indicate that GGDS outperforms previous methods with average performance improvements of 18.7%,6.3%,and 88.7%on TSP-500,TSP-100,and TSP-50,respectively.Specifically,GGDS demonstrates superior performance on TSP-500 and TSP-50,while its performance on TSP-100 is either comparable to or slightly better than that of previous methods,depending on the chosen noise schedule and decoding strategy.展开更多
In this paper,we are concerned with the stability of traveling wavefronts of a Belousov-Zhabotinsky model with mixed nonlocal and degenerate diffusions.Such a system can be used to study the competition among nonlocal...In this paper,we are concerned with the stability of traveling wavefronts of a Belousov-Zhabotinsky model with mixed nonlocal and degenerate diffusions.Such a system can be used to study the competition among nonlocally diffusive species and degenerately diffusive species.We prove that the traveling wavefronts are exponentially stable,when the initial perturbation around the traveling waves decays exponentially as x→-∞,but in other locations,the initial data can be arbitrarily large.The adopted methods are the weighted energy with the comparison principle and squeezing technique.展开更多
Digital rock analysis(DRA)is fundamental for geo-energy research,enabling the characterisation of microstructures for applications like hydrocarbon recovery,carbon storage,and groundwater modelling.Although 2D CT imag...Digital rock analysis(DRA)is fundamental for geo-energy research,enabling the characterisation of microstructures for applications like hydrocarbon recovery,carbon storage,and groundwater modelling.Although 2D CT images provide valuable pore-scale data,the scarcity of real-world datasets limits the effectiveness of advanced analysis.Generative AI presents a promising approach for synthesizing high-quality rock images but faces key challenges,including high computational demands,insufficient evaluation metrics,and the trade-off between image fidelity and diversity.To address these limitations,this study proposes the use of Low-Rank Adaptation(LoRA)for fine-tuning stable diffusion models,significantly reducing computational requirements while maintaining image quality.A systematic investigation was conducted to evaluate the influence of LoRA training parameters,including rank and learning rate,on the quality of generated images.Image outputs were assessed using both standard generative metrics,such as Kernel Inception Distance(KID),and domain-specific metrics,including porosity,pore count,and pore area distributions.The optimised LoRA-enhanced diffusion model achieved a 92.6% reduction in KID relative to baseline models,while also improving inference speed.Building on these advancements,this study demonstrates that the LoRA-enhanced diffusion model significantly improves neural network extrapolation in incomplete data scenarios through statistically consistent synthetic generation.Despite control challenges,this approach reduces costs and enables diverse applications,bridging fundamental rock physics with practical energy research.展开更多
In their recent paper Pereira et al.(2025)claim that validation is overlooked in mapping and modelling of ecosystem services(ES).They state that“many studies lack critical evaluation of the results and no validation ...In their recent paper Pereira et al.(2025)claim that validation is overlooked in mapping and modelling of ecosystem services(ES).They state that“many studies lack critical evaluation of the results and no validation is provided”and that“the validation step is largely overlooked”.This assertion may have been true several years ago,for example,when Ochoa and Urbina-Cardona(2017)made a similar observation.However,there has been much work on ES model validation over the last decade.展开更多
Hard carbon(HC)in sodium-ion batteries is searched by numerous investigations,which can offer the excellent performance of reversible Na^(+)insertion and extraction.The covalent heteroatom doping in HC is recently wor...Hard carbon(HC)in sodium-ion batteries is searched by numerous investigations,which can offer the excellent performance of reversible Na^(+)insertion and extraction.The covalent heteroatom doping in HC is recently worth concentrating,which can dilate the interlayer spacing of graphite to adjust the electrochemical storage performance in carbon anodes.However,the reported doping strategies of the modified HC have only resulted in limited improvement,especially unobvious effects on tuning porous structure.In this study,tannin extract and K_(2)SO_(4) are respectively utilized as carbon source and sulfur source for the fabrication of HC,in which K_(2)SO_(4) can contribute to the heteroatom doping,and the pore forming as well.The tannin-derived sulfur-doped carbon anode shows the excellent cycle stability,achieving a high reversible capacity of 520.5 mAh/g at a current density of 100 mA/g.Even after 500 cycles at a current density of 3 A/g,a high specific capacity of 236.7 mAh/g and a capacity retention rate of 92.6%can be reserved.Compared with the initial carbon,the adsorption energy of Na^(+)is multifold times higher,whereas Na^(+)diffusion energy barriers manyfold decrease.Moreover,the full battery assembled with Na_(3)V_(2)(PO_(4))_(3)/tannin-based HC demonstrates a stable cycling performance.This work can manifest the potentiality of the tannin-based electrode as anode for a high-performance sodium-ion batteries(SIBs),which could especially offer an explanation of Na^(+)storage and solid-electrolyte interface(SEI)stability to the electrochemical performance.展开更多
Injecting impure CO_(2)for enhanced gas recovery(CO_(2)-EGR)offers a dual benefit by improving natural gas extraction while enabling CO_(2)sequestration.However,the interactions between CO_(2),N_(2),and CH_(4)under re...Injecting impure CO_(2)for enhanced gas recovery(CO_(2)-EGR)offers a dual benefit by improving natural gas extraction while enabling CO_(2)sequestration.However,the interactions between CO_(2),N_(2),and CH_(4)under reservoir conditions require further investigation.This study employs Grand Canonical Monte Carlo(GCMC)and Molecular Dynamics(MD)simulations to quantify the adsorption and diffusion behaviors of CO_(2),N_(2),and CH_(4)in quartz nanopores over a pressure range of 1-24 MPa under varying water saturations and gas compositions.The results indicate that:(1)CO_(2)exhibits the broadest energy distribution and the strongest adsorption stability,occupying about 20%-30%more adsorption sites than CH_(4)or N_(2)and showing the least sensitivity to water saturation,with only a 30%reduction at 50%saturation,compared to 60%for CH_(4),giving CO_(2)a clear competitive advantage.(2)The adsorption and desorption behaviors are strongly pressure dependent,as increasing pressure reduces the adsorption layer area and shifts gas distribution from adsorption dominated to free phase.Competitive adsorption analysis reveals that while CO_(2)dominates displacement at low pressures,mixtures that contain N_(2)achieve higher CH_(4)desorption efficiency above 13 MPa by mitigating diffusion resistance.(3)A higher N_(2)fraction improves CH_(4)diffusion coefficients,thereby facilitating gas mobility and ensuring superior recovery performance under high-pressure conditions.This study advances the fundamental knowledge of microscale gas behavior in tight sandstones and supports the feasibility of impure CO_(2)injection as a practical strategy for sustainable gas production.展开更多
Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interact...Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interactions to predict future items of interest.However,many current methods rely on unique user and item IDs,limiting their ability to represent users and items effectively,especially in zero-shot learning scenarios where training data is scarce.With the rapid development of Large Language Models(LLMs),researchers are exploring their potential to enhance recommendation systems.However,there is a semantic gap between the linguistic semantics of LLMs and the collaborative semantics of recommendation systems,where items are typically indexed by IDs.Moreover,most research focuses on item representations,neglecting personalized user modeling.To address these issues,we propose a sequential recommendation framework using LLMs,called CIT-Rec,a model that integrates Collaborative semantics for user representation and Image and Text information for item representation to enhance Recommendations.Specifically,by aligning intuitive image information with text containing semantic features,we can more accurately represent items,improving item representation quality.We focus not only on item representations but also on user representations.To more precisely capture users’personalized preferences,we use traditional sequential recommendation models to train on users’historical interaction data,effectively capturing behavioral patterns.Finally,by combining LLMs and traditional sequential recommendation models,we allow the LLM to understand linguistic semantics while capturing collaborative semantics.Extensive evaluations on real-world datasets show that our model outperforms baseline methods,effectively combining user interaction history with item visual and textual modalities to provide personalized recommendations.展开更多
基金funding from the European Commission by the Ruralities project(grant agreement no.101060876).
文摘In this paper,we propose a new privacy-aware transmission scheduling algorithm for 6G ad hoc networks.This system enables end nodes to select the optimum time and scheme to transmit private data safely.In 6G dynamic heterogeneous infrastructures,unstable links and non-uniform hardware capabilities create critical issues regarding security and privacy.Traditional protocols are often too computationally heavy to allow 6G services to achieve their expected Quality-of-Service(QoS).As the transport network is built of ad hoc nodes,there is no guarantee about their trustworthiness or behavior,and transversal functionalities are delegated to the extreme nodes.However,while security can be guaranteed in extreme-to-extreme solutions,privacy cannot,as all intermediate nodes still have to handle the data packets they are transporting.Besides,traditional schemes for private anonymous ad hoc communications are vulnerable against modern intelligent attacks based on learning models.The proposed scheme fulfills this gap.Findings show the probability of a successful intelligent attack reduces by up to 65%compared to ad hoc networks with no privacy protection strategy when used the proposed technology.While congestion probability can remain below 0.001%,as required in 6G services.
基金supported by the National Natural Science Foundation Joint Fund,No.U22A20309(to PY)the Natural Science Foundation of LiaoningProvince,No.2023-MS-07(to HuL)the Unveiling Key Scientific and Technological Projects of Liaoning Province,No.2021JH1/10400051(to HuL).
文摘Some patients with systemic lupus erythematosus experience neuropsychiatric symptoms.Although magnetic resonance imaging can detect abnormal signals in the white matter of the brain,conventional methods often struggle to accurately capture microstructural changes.Various diffusion models have been used to study white matter in systemic lupus erythematosus;however,comparative analyses of their sensitivity and specificity for detecting microstructural changes remain insufficient.To address this,our team designed a diagnostic trial that used multimodal diffusion imaging techniques to observe white matter microstructural changes in patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with an aim to identify key diagnostic biomarkers for these patients.Patients with active lupus who received treatment at the Department of Rheumatology and Immunology,The First Affiliated Hospital of China Medical University,from September 2023 to March 2024 were recruited.According to the standards of the American College of Rheumatology,patients with systemic lupus erythematosus who had neuropsychiatric symptoms were assigned to the systemic lupus erythematosus group,whereas those without neuropsychiatric symptoms were assigned to the non-systemic lupus erythematosus group.Additionally,healthy volunteers matched by region,sex,and age were recruited as controls.All three groups underwent the same diffusion magnetic resonance imaging examination protocol to compare differences in diffusion parameters.Advanced diffusion imaging models were able to sensitively detect microstructural changes in the white matter fibers of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,with specific diffusion parameters showing significant abnormalities in key brain regions.In the left superior longitudinal fasciculus subregion and the right thalamic radiations of patients with systemic lupus erythematosus who had neuropsychiatric symptoms,we also identified abnormal diffusion characteristics that were clearly correlated with disease activity,suggesting that microstructural changes in these areas may reflect the dynamic process of neuroinflammatory damage.The present study addresses critical challenges in the diagnosis of systemic lupus erythematosus by identifying specific white matter imaging biomarkers and elucidating the association between microstructural damage and clinical manifestations.The main contributions of our study include:1)establishing axial regression probability parameters from mean apparent propagator magnetic resonance imaging as sensitive biomarkers for systemic lupus erythematosus,particularly in the third subregion of the left superior longitudinal fasciculus;2)demonstrating that multimodal diffusion imaging may be superior to conventional diffusion tensor imaging for detecting white matter microstructural abnormalities in patients with systemic lupus erythematosus;and 3)integrating tract-based spatial statistics with clinically relevant analyses to link imaging findings to pathological mechanisms.
基金supported by the National Natural Science Foundation of China(Nos.61906168,62202429 and 62272267)the Zhejiang Provincial Natural Science Foundation of China(No.LY23F020023)the Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(No.2022SDSJ01)。
文摘Accurately identifying building distribution from remote sensing images with complex background information is challenging.The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds.Building on this concept,we propose a novel framework,building extraction diffusion model(BEDiff),which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion.Our approach begins with the design of booster guidance,a mechanism that extracts structural and semantic features from remote sensing images to serve as priors,thereby providing targeted guidance for the diffusion process.Additionally,we introduce a cross-feature fusion module(CFM)that bridges the semantic gap between different types of features,facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively.Our proposed BEDiff marks the first application of diffusion models to the task of building extraction.Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff,affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.
基金co-supported by the National Natural Science Foundation of China(Nos.61806219,61876189 and 61703426)the Young Talent Fund of University Association for Science and Technology in Shaanxi,China(Nos.20190108 and 20220106)the Innvation Talent Supporting Project of Shaanxi,China(No.2020KJXX-065)。
文摘Air target intent recognition holds significant importance in aiding commanders to assess battlefield situations and secure a competitive edge in decision-making.Progress in this domain has been hindered by challenges posed by imbalanced battlefield data and the limited robustness of traditional recognition models.Inspired by the success of diffusion models in addressing visual domain sample imbalances,this paper introduces a new approach that utilizes the Markov Transfer Field(MTF)method for time series data visualization.This visualization,when combined with the Denoising Diffusion Probabilistic Model(DDPM),effectively enhances sample data and mitigates noise within the original dataset.Additionally,a transformer-based model tailored for time series visualization and air target intent recognition is developed.Comprehensive experimental results,encompassing comparative,ablation,and denoising validations,reveal that the proposed method achieves a notable 98.86%accuracy in air target intent recognition while demonstrating exceptional robustness and generalization capabilities.This approach represents a promising avenue for advancing air target intent recognition.
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
基金supported by the Aeronautical Science Foundation of China(Nos.2018ZA52002,2019ZA052011)the National Natural Science Foundation of China(No.12472236).
文摘Obtaining unsteady hydrodynamic performance is of great significance for seaplane design.Common methods for obtaining unsteady hydrodynamic performance data include tank test and Computational Fluid Dynamics(CFD)numerical simulation,which are costly and time-consuming.Therefore,it is necessary to obtain unsteady hydrodynamic performance in a low-cost and high-precision manner.Due to the strong nonlinearity,complex data distribution,and temporal characteristics of unsteady hydrodynamic performance,the prediction of it is challenging.This paper proposes a Temporal Convolutional Diffusion Model(TCDM)for predicting the unsteady hydrodynamic performance of seaplanes given design parameters.Under the framework of a classifier-free guided diffusion model,TCDM learns the distribution patterns of unsteady hydrodynamic performance data with the designed denoising module based on temporal convolutional network and captures the temporal features of unsteady hydrodynamic performance data.Using CFD simulation data,the proposed method is compared with the alternative methods to demonstrate its accuracy and generalization.This paper provides a method that enables the rapid and accurate prediction of unsteady hydrodynamic performance data,expecting to shorten the design cycle of seaplanes.
基金supported in part by the National Natural Science Foundation of China under Grants 62102450,62272478 and the Independent Research Project of a Certain Unit under Grant ZZKY20243127。
文摘Traditional steganography conceals information by modifying cover data,but steganalysis tools easily detect such alterations.While deep learning-based steganography often involves high training costs and complex deployment.Diffusion model-based methods face security vulnerabilities,particularly due to potential information leakage during generation.We propose a fixed neural network image steganography framework based on secure diffu-sion models to address these challenges.Unlike conventional approaches,our method minimizes cover modifications through neural network optimization,achieving superior steganographic performance in human visual perception and computer vision analyses.The cover images are generated in an anime style using state-of-the-art diffusion models,ensuring the transmitted images appear more natural.This study introduces fixed neural network technology that allows senders to transmit only minimal critical information alongside stego-images.Recipients can accurately reconstruct secret images using this compact data,significantly reducing transmission overhead compared to conventional deep steganography.Furthermore,our framework innovatively integrates ElGamal,a cryptographic algorithm,to protect critical information during transmission,enhancing overall system security and ensuring end-to-end information protection.This dual optimization of payload reduction and cryptographic reinforcement establishes a new paradigm for secure and efficient image steganography.
基金supported by the National Natural Science Foundation of China(No.12472265)。
文摘High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is highly resource intensive.This paper proposes a FlowViT-Diff framework that integrates a Vision Transformer(ViT)with an enhanced denoising diffusion probabilistic model for the Super-Resolution(SR)reconstruction of HR flow fields based on low-resolution inputs.It provides a quick initial prediction of the HR flow field by optimizing the ViT architecture,and incorporates this preliminary output as guidance within an enhanced diffusion model.The latter captures the Gaussian noise distribution during forward diffusion and progressively removes it during backward diffusion to generate the flow field.Experiments on various supercritical airfoils under different flow conditions show that FlowViT-Diff can robustly reconstruct the flow field across multiple levels of downsampling.It obtains more consistent global and local features than traditional SR methods,and yields a 3.6-fold increase in its training speed via transfer learning.Its accuracy of reconstruction of the flow field is 99.7%under ultra-low downsampling.The results demonstrate that Flow Vi T-Diff not only exhibits effective flow field reconstruction capabilities,but also provides two reconstruction strategies,both of which show effective transferability.
基金supported by the Key Project of International Cooperation of Qilu University of Technology(Grant No.:QLUTGJHZ2018008)Shandong Provincial Natural Science Foundation Committee,China(Grant No.:ZR2016HB54)Shandong Provincial Key Laboratory of Microbial Engineering(SME).
文摘AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.
基金supported by the National Natural Science Foundation of China(Grant No.72161034).
文摘Human motion modeling is a core technology in computer animation,game development,and humancomputer interaction.In particular,generating natural and coherent in-between motion using only the initial and terminal frames remains a fundamental yet unresolved challenge.Existing methods typically rely on dense keyframe inputs or complex prior structures,making it difficult to balance motion quality and plausibility under conditions such as sparse constraints,long-term dependencies,and diverse motion styles.To address this,we propose a motion generation framework based on a frequency-domain diffusion model,which aims to better model complex motion distributions and enhance generation stability under sparse conditions.Our method maps motion sequences to the frequency domain via the Discrete Cosine Transform(DCT),enabling more effective modeling of low-frequency motion structures while suppressing high-frequency noise.A denoising network based on self-attention is introduced to capture long-range temporal dependencies and improve global structural awareness.Additionally,a multi-objective loss function is employed to jointly optimize motion smoothness,pose diversity,and anatomical consistency,enhancing the realism and physical plausibility of the generated sequences.Comparative experiments on the Human3.6M and LaFAN1 datasets demonstrate that our method outperforms state-of-the-art approaches across multiple performance metrics,showing stronger capabilities in generating intermediate motion frames.This research offers a new perspective and methodology for human motion generation and holds promise for applications in character animation,game development,and virtual interaction.
基金supported by the Cultivation Program for Major Scientific Research Projects of Harbin Institute of Technology(ZDXMPY20180109).
文摘Scalable simulation leveraging real-world data plays an essential role in advancing autonomous driving,owing to its efficiency and applicability in both training and evaluating algorithms.Consequently,there has been increasing attention on generating highly realistic and consistent driving videos,particularly those involving viewpoint changes guided by the control commands or trajectories of ego vehicles.However,current reconstruction approaches,such as Neural Radiance Fields and 3D Gaussian Splatting,frequently suffer from limited generalization and depend on substantial input data.Meanwhile,2D generative models,though capable of producing unknown scenes,still have room for improvement in terms of coherence and visual realism.To overcome these challenges,we introduce GenScene,a world model that synthesizes front-view driving videos conditioned on trajectories.A new temporal module is presented to improve video consistency by extracting the global context of each frame,calculating relationships of frames using these global representations,and fusing frame contexts accordingly.Moreover,we propose an innovative attention mechanism that computes relations of pixels within each frame and pixels in the corresponding window range of the initial frame.Extensive experiments show that our approach surpasses various state-of-the-art models in driving video generation,and the introduced modules contribute significantly to model performance.This work establishes a new paradigm for goal-oriented video synthesis in autonomous driving,which facilitates on-demand simulation to expedite algorithm development.
基金the National Natural Science Foundation of China(Grant No.:52508343)the Fundamental Research Funds for the Central Universities(Grant No.:B250201004).
文摘Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.
基金supported by the National Science and Technology Council,Taiwan,under grant no.NSTC 114-2221-E-197-005-MY3.
文摘With the development of technology,diffusion model-based solvers have shown significant promise in solving Combinatorial Optimization(CO)problems,particularly in tackling Non-deterministic Polynomial-time hard(NP-hard)problems such as the Traveling Salesman Problem(TSP).However,existing diffusion model-based solvers typically employ a fixed,uniform noise schedule(e.g.,linear or cosine annealing)across all training instances,failing to fully account for the unique characteristics of each problem instance.To address this challenge,we present GraphGuided Diffusion Solvers(GGDS),an enhanced method for improving graph-based diffusion models.GGDS leverages Graph Neural Networks(GNNs)to capture graph structural information embedded in node coordinates and adjacency matrices,dynamically adjusting the noise levels in the diffusion model.This study investigates the TSP by examining two distinct time-step noise generation strategies:cosine annealing and a Neural Network(NN)-based approach.We evaluate their performance across different problem scales,particularly after integrating graph structural information.Experimental results indicate that GGDS outperforms previous methods with average performance improvements of 18.7%,6.3%,and 88.7%on TSP-500,TSP-100,and TSP-50,respectively.Specifically,GGDS demonstrates superior performance on TSP-500 and TSP-50,while its performance on TSP-100 is either comparable to or slightly better than that of previous methods,depending on the chosen noise schedule and decoding strategy.
基金Supported by the National Natural Science Foundation of China(Grant No.12261081).
文摘In this paper,we are concerned with the stability of traveling wavefronts of a Belousov-Zhabotinsky model with mixed nonlocal and degenerate diffusions.Such a system can be used to study the competition among nonlocally diffusive species and degenerately diffusive species.We prove that the traveling wavefronts are exponentially stable,when the initial perturbation around the traveling waves decays exponentially as x→-∞,but in other locations,the initial data can be arbitrarily large.The adopted methods are the weighted energy with the comparison principle and squeezing technique.
基金funding from Innovate UK(reference number:10003208)the China Scholarship Council(Grant No.CSC 202408420030).
文摘Digital rock analysis(DRA)is fundamental for geo-energy research,enabling the characterisation of microstructures for applications like hydrocarbon recovery,carbon storage,and groundwater modelling.Although 2D CT images provide valuable pore-scale data,the scarcity of real-world datasets limits the effectiveness of advanced analysis.Generative AI presents a promising approach for synthesizing high-quality rock images but faces key challenges,including high computational demands,insufficient evaluation metrics,and the trade-off between image fidelity and diversity.To address these limitations,this study proposes the use of Low-Rank Adaptation(LoRA)for fine-tuning stable diffusion models,significantly reducing computational requirements while maintaining image quality.A systematic investigation was conducted to evaluate the influence of LoRA training parameters,including rank and learning rate,on the quality of generated images.Image outputs were assessed using both standard generative metrics,such as Kernel Inception Distance(KID),and domain-specific metrics,including porosity,pore count,and pore area distributions.The optimised LoRA-enhanced diffusion model achieved a 92.6% reduction in KID relative to baseline models,while also improving inference speed.Building on these advancements,this study demonstrates that the LoRA-enhanced diffusion model significantly improves neural network extrapolation in incomplete data scenarios through statistically consistent synthetic generation.Despite control challenges,this approach reduces costs and enables diverse applications,bridging fundamental rock physics with practical energy research.
文摘In their recent paper Pereira et al.(2025)claim that validation is overlooked in mapping and modelling of ecosystem services(ES).They state that“many studies lack critical evaluation of the results and no validation is provided”and that“the validation step is largely overlooked”.This assertion may have been true several years ago,for example,when Ochoa and Urbina-Cardona(2017)made a similar observation.However,there has been much work on ES model validation over the last decade.
基金supported by National Natural Science Foundation of China(Nos.32271791,32171709 and 22475053)Hunan Provincial Natural Science Foundation of China(No.2024JJ7643)Natural Science Foundation of Shanghai(No.22ZR1404100).
文摘Hard carbon(HC)in sodium-ion batteries is searched by numerous investigations,which can offer the excellent performance of reversible Na^(+)insertion and extraction.The covalent heteroatom doping in HC is recently worth concentrating,which can dilate the interlayer spacing of graphite to adjust the electrochemical storage performance in carbon anodes.However,the reported doping strategies of the modified HC have only resulted in limited improvement,especially unobvious effects on tuning porous structure.In this study,tannin extract and K_(2)SO_(4) are respectively utilized as carbon source and sulfur source for the fabrication of HC,in which K_(2)SO_(4) can contribute to the heteroatom doping,and the pore forming as well.The tannin-derived sulfur-doped carbon anode shows the excellent cycle stability,achieving a high reversible capacity of 520.5 mAh/g at a current density of 100 mA/g.Even after 500 cycles at a current density of 3 A/g,a high specific capacity of 236.7 mAh/g and a capacity retention rate of 92.6%can be reserved.Compared with the initial carbon,the adsorption energy of Na^(+)is multifold times higher,whereas Na^(+)diffusion energy barriers manyfold decrease.Moreover,the full battery assembled with Na_(3)V_(2)(PO_(4))_(3)/tannin-based HC demonstrates a stable cycling performance.This work can manifest the potentiality of the tannin-based electrode as anode for a high-performance sodium-ion batteries(SIBs),which could especially offer an explanation of Na^(+)storage and solid-electrolyte interface(SEI)stability to the electrochemical performance.
基金supported by the National Natural Science Foundation of China(Grant No.U23A2022)the National Natural Science Foundation of China(Grant No.52474047)+2 种基金the Natural Science Foundation of Chongqing(Grant No.CSTB2024NSCQ-MSX0951)the Natural Science Foundation of Sichuan Province(Grant No.2025ZNSFSC1357)the National Science and Technology Major Project(Grant No.2025ZD1404307).
文摘Injecting impure CO_(2)for enhanced gas recovery(CO_(2)-EGR)offers a dual benefit by improving natural gas extraction while enabling CO_(2)sequestration.However,the interactions between CO_(2),N_(2),and CH_(4)under reservoir conditions require further investigation.This study employs Grand Canonical Monte Carlo(GCMC)and Molecular Dynamics(MD)simulations to quantify the adsorption and diffusion behaviors of CO_(2),N_(2),and CH_(4)in quartz nanopores over a pressure range of 1-24 MPa under varying water saturations and gas compositions.The results indicate that:(1)CO_(2)exhibits the broadest energy distribution and the strongest adsorption stability,occupying about 20%-30%more adsorption sites than CH_(4)or N_(2)and showing the least sensitivity to water saturation,with only a 30%reduction at 50%saturation,compared to 60%for CH_(4),giving CO_(2)a clear competitive advantage.(2)The adsorption and desorption behaviors are strongly pressure dependent,as increasing pressure reduces the adsorption layer area and shifts gas distribution from adsorption dominated to free phase.Competitive adsorption analysis reveals that while CO_(2)dominates displacement at low pressures,mixtures that contain N_(2)achieve higher CH_(4)desorption efficiency above 13 MPa by mitigating diffusion resistance.(3)A higher N_(2)fraction improves CH_(4)diffusion coefficients,thereby facilitating gas mobility and ensuring superior recovery performance under high-pressure conditions.This study advances the fundamental knowledge of microscale gas behavior in tight sandstones and supports the feasibility of impure CO_(2)injection as a practical strategy for sustainable gas production.
基金supported by the National Key R&D Program of China[2022YFF0902703]the State Administration for Market Regulation Science and Technology Plan Project(2024MK033).
文摘Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interactions to predict future items of interest.However,many current methods rely on unique user and item IDs,limiting their ability to represent users and items effectively,especially in zero-shot learning scenarios where training data is scarce.With the rapid development of Large Language Models(LLMs),researchers are exploring their potential to enhance recommendation systems.However,there is a semantic gap between the linguistic semantics of LLMs and the collaborative semantics of recommendation systems,where items are typically indexed by IDs.Moreover,most research focuses on item representations,neglecting personalized user modeling.To address these issues,we propose a sequential recommendation framework using LLMs,called CIT-Rec,a model that integrates Collaborative semantics for user representation and Image and Text information for item representation to enhance Recommendations.Specifically,by aligning intuitive image information with text containing semantic features,we can more accurately represent items,improving item representation quality.We focus not only on item representations but also on user representations.To more precisely capture users’personalized preferences,we use traditional sequential recommendation models to train on users’historical interaction data,effectively capturing behavioral patterns.Finally,by combining LLMs and traditional sequential recommendation models,we allow the LLM to understand linguistic semantics while capturing collaborative semantics.Extensive evaluations on real-world datasets show that our model outperforms baseline methods,effectively combining user interaction history with item visual and textual modalities to provide personalized recommendations.