Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinct...Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.展开更多
Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approach...Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.展开更多
Low-light image enhancement aims to improve the visibility of severely degraded images captured under insufficient illumination,alleviating the adverse effects of illumination degradation on image quality.Traditional ...Low-light image enhancement aims to improve the visibility of severely degraded images captured under insufficient illumination,alleviating the adverse effects of illumination degradation on image quality.Traditional Retinex-based approaches,inspired by human visual perception of brightness and color,decompose an image into illumination and reflectance components to restore fine details.However,their limited capacity for handling noise and complex lighting conditions often leads to distortions and artifacts in the enhanced results,particularly under extreme low-light scenarios.Although deep learning methods built upon Retinex theory have recently advanced the field,most still suffer frominsufficient interpretability and sub-optimal enhancement performance.This paper presents RetinexWT,a novel framework that tightly integrates classical Retinex theory with modern deep learning.Following Retinex principles,RetinexWT employs wavelet transforms to estimate illumination maps for brightness adjustment.A detail-recovery module that synergistically combines Vision Transformer(ViT)and wavelet transforms is then introduced to guide the restoration of lost details,thereby improving overall image quality.Within the framework,wavelet decomposition splits input features into high-frequency and low-frequency components,enabling scale-specific processing of global illumination/color cues and fine textures.Furthermore,a gating mechanism selectively fuses down-sampled and up-sampled features,while an attention-based fusion strategy enhances model interpretability.Extensive experiments on the LOL dataset demonstrate that RetinexWT surpasses existing Retinex-oriented deeplearning methods,achieving an average Peak Signal-to-Noise Ratio(PSNR)improvement of 0.22 dB over the current StateOfTheArt(SOTA),thereby confirming its superiority in low-light image enhancement.Code is available at https://github.com/CHEN-hJ516/RetinexWT(accessed on 14 October 2025).展开更多
Underwater images often affect the effectiveness of underwater visual tasks due to problems such as light scattering,color distortion,and detail blurring,limiting their application performance.Existing underwater imag...Underwater images often affect the effectiveness of underwater visual tasks due to problems such as light scattering,color distortion,and detail blurring,limiting their application performance.Existing underwater image enhancement methods,although they can improve the image quality to some extent,often lead to problems such as detail loss and edge blurring.To address these problems,we propose FENet,an efficient underwater image enhancement method.FENet first obtains three different scales of images by image downsampling and then transforms them into the frequency domain to extract the low-frequency and high-frequency spectra,respectively.Then,a distance mask and a mean mask are constructed based on the distance and magnitude mean for enhancing the high-frequency part,thus improving the image details and enhancing the effect by suppressing the noise in the low-frequency part.Affected by the light scattering of underwater images and the fact that some details are lost if they are directly reduced to the spatial domain after the frequency domain operation.For this reason,we propose a multi-stage residual feature aggregation module,which focuses on detail extraction and effectively avoids information loss caused by global enhancement.Finally,we combine the edge guidance strategy to further enhance the edge details of the image.Experimental results indicate that FENet outperforms current state-of-the-art underwater image enhancement methods in quantitative and qualitative evaluations on multiple publicly available datasets.展开更多
Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspectio...Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspection,which requires extensive experience,suffers from low efficiency,and is highly subjective.As the results are presented as radar images,image processing methods can be applied for fast and objective identification.Deep learning-based approaches now offer a robust solution for automated roadbed disease detection.This study proposes an enhanced Faster Region-based Convolutional Neural Networks(R-CNN)framework integrating ResNet-50 as the backbone and two-dimensional discrete Fourier spectrum transformation(2D-DFT)for frequency-domain feature fusion.A dedicated GPR image dataset comprising 1650 annotated images was constructed and augmented to 6600 images via median filtering,histogram equalization,and binarization.The proposed model segments defect regions,applies binary masking,and fuses frequency-domain features to improve small-target detection under noisy backgrounds.Experimental results show that the improved Faster R-CNN achieves a mean Average Precision(mAP)of 0.92,representing a 0.22 increase over the baseline.Precision improved by 26%while recall remained stable at 87%.The model was further validated on real urban road data,demonstrating robust detection capability even under interference.These findings highlight the potential of combining GPR with deep learning for efficient,non-destructive roadbed health monitoring.展开更多
This study integrates explicit input enhancement into comparative continuation writing,defined as a task in which learners produce a continuation by comparing their own expression with an input text,aligning with its ...This study integrates explicit input enhancement into comparative continuation writing,defined as a task in which learners produce a continuation by comparing their own expression with an input text,aligning with its discourse structure and linguistic features,while developing their own ideas.It aims to examine whether English as a Foreign Language(EFL)learners in China exhibit differences in discourse competence and writing performance when completing comparative continuation writing combined with different input enhancement techniques,and whether the alignment effect occurs at the discourse level.Sixty first-year Chinese senior middle school students were divided into four groups:three groups engaged in comparative continuation writing with varying input enhancement,achieved by combining different techniques,while a control group performed a designated-topic writing task.The results revealed that three comparative continuation writing groups outperformed the designated-topic writing group in discourse competence,particularly in the use of temporal connectives.However,differences and some inconsistencies were observed among the comparative continuation writing groups across individual indices.The study highlights effective ways to incorporate comparative continuation writing into English instruction and demonstrates how explicit input enhancement can complement the task,simultaneously activating the alignment effect proposed by the xu-argument and enhancing discourse competence in writing.展开更多
MnO_(x)-CeO_(2)catalysts for the low-temperature selective catalytic reduction(SCR)of NO remain vulnerable to water and sulfur poisoning,limting their practical applications.Herein,we report a hydrophobic-modified MnO...MnO_(x)-CeO_(2)catalysts for the low-temperature selective catalytic reduction(SCR)of NO remain vulnerable to water and sulfur poisoning,limting their practical applications.Herein,we report a hydrophobic-modified MnO_(x)-CeO_(2)catalyst that achieves enhanced NO conversion rate and stability under harsh conditions.The catalyst was synthesized by decorating MnOx crystals with amorphous CeO_(2),followed by loading hydrophobic silica on the external surfaces.The hydrophobic silica allowed the adsorption of NH_(3)and NO and diffusion of H,suppressed the adsorption of H_(2)O,and prevented SO_(2)interaction with the Mn active sites,achieving selective molecular discrimination at the catalyst surface.At 120℃,under H_(2)O and SO_(2)exposure,the optimal hydrophobic catalyst maintains 82%NO conversion rate compared with 69%for the unmodified catalyst.The average adsorption energies of NH_(3),H_(2)O,and SO_(2)decreased by 0.05,0.43,and 0.52 eV,respectively.The NO reduction pathway follows the Eley-Rideal mechanism,NH_(3)^(*)+*→NH_(2)^(*)+H^(*)followed by NH_(2)^(*)+NO^(*)→N_(2)^(*)+H_(2)O^(*),with NH_(3)dehydrogenation being the rate determining step.Hydrophobic modification increased the activation energy for H atom transfer,leading to a minor decrease in the NO conversion rate at 120℃.This work demonstrates a viable strategy for developing robust NH_(3)-S CR catalysts capable of efficient operation in water-and sulfur-rich environments.展开更多
Lateral movement represents the most covert and critical phase of Advanced Persistent Threats(APTs),and its detection still faces two primary challenges:sample scarcity and“cold start”of new entities.To address thes...Lateral movement represents the most covert and critical phase of Advanced Persistent Threats(APTs),and its detection still faces two primary challenges:sample scarcity and“cold start”of new entities.To address these challenges,we propose an Uncertainty-Driven Graph Embedding-Enhanced Lateral Movement Detection framework(UGEA-LMD).First,the framework employs event-level incremental encoding on a continuous-time graph to capture fine-grained behavioral evolution,enabling newly appearing nodes to retain temporal contextual awareness even in the absence of historical interactions and thereby fundamentally mitigating the cold-start problem.Second,in the embedding space,we model the dependency structure among feature dimensions using a Gaussian copula to quantify the uncertainty distribution,and generate augmented samples with consistent structural and semantic properties through adaptive sampling,thus expanding the representation space of sparse samples and enhancing the model’s generalization under sparse sample conditions.Unlike static graph methods that cannot model temporal dependencies or data augmentation techniques that depend on predefined structures,UGEA-LMD offers both superior temporaldynamic modeling and structural generalization.Experimental results on the large-scale LANL log dataset demonstrate that,under the transductive setting,UGEA-LMD achieves an AUC of 0.9254;even when 10%of nodes or edges are withheld during training,UGEA-LMD significantly outperforms baseline methods on metrics such as recall and AUC,confirming its robustness and generalization capability in sparse-sample and cold-start scenarios.展开更多
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM...To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.展开更多
Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)meth...Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics.展开更多
Recently,a multitude of techniques that fuse deep learning with Retinex theory have been utilized in the field of low-light image enhancement,yielding remarkable outcomes.Due to the intricate nature of imaging scenari...Recently,a multitude of techniques that fuse deep learning with Retinex theory have been utilized in the field of low-light image enhancement,yielding remarkable outcomes.Due to the intricate nature of imaging scenarios,including fluctuating noise levels and unpredictable environmental elements,these techniques do not fully resolve these challenges.We introduce an innovative strategy that builds upon Retinex theory and integrates a novel deep network architecture,merging the Convolutional Block Attention Module(CBAM)with the Transformer.Our model is capable of detecting more prominent features across both channel and spatial domains.We have conducted extensive experiments across several datasets,namely LOLv1,LOLv2-real,and LOLv2-sync.The results show that our approach surpasses other methods when evaluated against critical metrics such as Peak Signal-to-Noise Ratio(PSNR)and Structural Similarity Index(SSIM).Moreover,we have visually assessed images enhanced by various techniques and utilized visual metrics like LPIPS for comparison,and the experimental data clearly demonstrate that our approach excels visually over other methods as well.展开更多
The unmanned aerial vehicle(UAV)images captured under low-light conditions are often suffering from noise and uneven illumination.To address these issues,we propose a low-light image enhancement algorithm for UAV imag...The unmanned aerial vehicle(UAV)images captured under low-light conditions are often suffering from noise and uneven illumination.To address these issues,we propose a low-light image enhancement algorithm for UAV images,which is inspired by the Retinex theory and guided by a light weighted map.Firstly,we propose a new network for reflectance component processing to suppress the noise in images.Secondly,we construct an illumination enhancement module that uses a light weighted map to guide the enhancement process.Finally,the processed reflectance and illumination components are recombined to obtain the enhancement results.Experimental results show that our method can suppress the noise in images while enhancing image brightness,and prevent over enhancement in bright regions.Code and data are available at https://gitee.com/baixiaotong2/uav-images.git.展开更多
In low-light environments,captured images often exhibit issues such as insufficient clarity and detail loss,which significantly degrade the accuracy of subsequent target recognition tasks.To tackle these challenges,th...In low-light environments,captured images often exhibit issues such as insufficient clarity and detail loss,which significantly degrade the accuracy of subsequent target recognition tasks.To tackle these challenges,this study presents a novel low-light image enhancement algorithm that leverages virtual hazy image generation through dehazing models based on statistical analysis.The proposed algorithm initiates the enhancement process by transforming the low-light image into a virtual hazy image,followed by image segmentation using a quadtree method.To improve the accuracy and robustness of atmospheric light estimation,the algorithm incorporates a genetic algorithm to optimize the quadtree-based estimation of atmospheric light regions.Additionally,this method employs an adaptive window adjustment mechanism to derive the dark channel prior image,which is subsequently refined using morphological operations and guided filtering.The final enhanced image is reconstructed through the hazy image degradation model.Extensive experimental evaluations across multiple datasets verify the superiority of the designed framework,achieving a peak signal-to-noise ratio(PSNR)of 17.09 and a structural similarity index(SSIM)of 0.74.These results indicate that the proposed algorithm not only effectively enhances image contrast and brightness but also outperforms traditional methods in terms of subjective and objective evaluation metrics.展开更多
AIM:To find the effective contrast enhancement method on retinal images for effective segmentation of retinal features.METHODS:A novel image preprocessing method that used neighbourhood-based improved contrast limited...AIM:To find the effective contrast enhancement method on retinal images for effective segmentation of retinal features.METHODS:A novel image preprocessing method that used neighbourhood-based improved contrast limited adaptive histogram equalization(NICLAHE)to improve retinal image contrast was suggested to aid in the accurate identification of retinal disorders and improve the visibility of fine retinal structures.Additionally,a minimal-order filter was applied to effectively denoise the images without compromising important retinal structures.The novel NICLAHE algorithm was inspired by the classical CLAHE algorithm,but enhanced it by selecting the clip limits and tile sized in a dynamical manner relative to the pixel values in an image as opposed to using fixed values.It was evaluated on the Drive and high-resolution fundus(HRF)datasets on conventional quality measures.RESULTS:The new proposed preprocessing technique was applied to two retinal image databases,Drive and HRF,with four quality metrics being,root mean square error(RMSE),peak signal to noise ratio(PSNR),root mean square contrast(RMSC),and overall contrast.The technique performed superiorly on both the data sets as compared to the traditional enhancement methods.In order to assess the compatibility of the method with automated diagnosis,a deep learning framework named ResNet was applied in the segmentation of retinal blood vessels.Sensitivity,specificity,precision and accuracy were used to analyse the performance.NICLAHE–enhanced images outperformed the traditional techniques on both the datasets with improved accuracy.CONCLUSION:NICLAHE provides better results than traditional methods with less error and improved contrastrelated values.These enhanced images are subsequently measured by sensitivity,specificity,precision,and accuracy,which yield a better result in both datasets.展开更多
The traditional FastSpeech2 has high generation efficiency and speech naturalness,but it still has limitations in metrical modeling,especially in the lack of effective linkage between semantics and metre.To enhance th...The traditional FastSpeech2 has high generation efficiency and speech naturalness,but it still has limitations in metrical modeling,especially in the lack of effective linkage between semantics and metre.To enhance the performance of synthesized speech in terms of rhythmic expression,ProsodySpeech speech synthesis system that incorporates BERT pre-trained language model was proposed in this study.By introducing the Pre-trained Language Model Adapter(PLM Adapter)and the Semantic-Prosody Mapping Network(SPMN),and by fully utilizing the deep semantic information extracted by BERT,the system enhanced its control over rhythmic features such as pitch,energy,and duration.The proposed model achieved effective alignment and mapping between semantic information and prosody parameters by introducing a shared semantic processing layer,a global self-attention mechanism,and a specially designed prosody mapping branch.Experimental results showed that the model proposed in this study outperforms VITS and StyleTTS2 in terms of Mean Opinion Score(MOS),and the synthesized speech has a more obvious advantage in terms of rhythmic naturalness and expressive richness,which verified the effectiveness of the proposed model in enhancing the expression of speech rhythms,and the synthesized speech is closer to the expression of natural human speech.展开更多
Edge structures are ubiquitous in the processing and fabrication of various optoelectronic devices.Novel physical properties and enhanced light–matter interactions are anticipated to occur at crystal edges due to the...Edge structures are ubiquitous in the processing and fabrication of various optoelectronic devices.Novel physical properties and enhanced light–matter interactions are anticipated to occur at crystal edges due to the broken spatial translational symmetry.However,the intensity of first-order Raman scattering at crystal edges has been rarely explored,although the mechanical stress and edge characteristics have been thoroughly studied by the Raman peak shift and the spectral features of the edge-related Raman modes.Here,by taking Ga As crystal with a well-defined edge as an example,we reveal the intensity enhancement of Raman-active modes and the emergence of Raman-forbidden modes under specific polarization configurations at the edge.This is attributed to the presence of a hot spot at the edge due to the redistributed electromagnetic fields and electromagnetic wave propagations of incident laser and Raman signal near the edge,which are confirmed by the finite-difference time-domain simulations.Spatially-resolved Raman intensities of both Raman-active and Raman-forbidden modes near the edge are calculated based on the redistributed electromagnetic fields,which quantitatively reproduce the corresponding experimental results.These findings offer new insights into the intensity enhancement of Raman scattering at crystal edges and present a new avenue to manipulate light–matter interactions of crystal by manufacturing various types of edges and to characterize the edge structures in photonic and optoelectronic devices.展开更多
Low-light image enhancement is one of the most active research areas in the field of computer vision in recent years.In the low-light image enhancement process,loss of image details and increase in noise occur inevita...Low-light image enhancement is one of the most active research areas in the field of computer vision in recent years.In the low-light image enhancement process,loss of image details and increase in noise occur inevitably,influencing the quality of enhanced images.To alleviate this problem,a low-light image enhancement model called RetinexNet model based on Retinex theory was proposed in this study.The model was composed of an image decomposition module and a brightness enhancement module.In the decomposition module,a convolutional block attention module(CBAM)was incorporated to enhance feature representation capacity of the network,focusing on crucial features and suppressing irrelevant ones.A multifeature fusion denoising module was designed within the brightness enhancement module,circumventing the issue of feature loss during downsampling.The proposed model outperforms the existing algorithms in terms of PSNR and SSIM metrics on the publicly available datasets LOL and MIT-Adobe FiveK,as well as gives superior results in terms of NIQE metrics on the publicly available dataset LIME.展开更多
Underwater images are inherently degraded by color distortion,contrast reduction,and uneven brightness,primarily due to light absorption and scattering in water.To mitigate these challenges,a novel enhancement approac...Underwater images are inherently degraded by color distortion,contrast reduction,and uneven brightness,primarily due to light absorption and scattering in water.To mitigate these challenges,a novel enhancement approach is proposed,integrating Local Adaptive Color Correction(LACC)with contrast enhancement based on adaptive Rayleigh distribution stretching and CLAHE(LACC-RCE).Conventional color correction methods predominantly employ global adjustment strategies,which are often inadequate for handling spatially varying color distortions.In contrast,the proposed LACC method incorporates local color analysis,tone-weighted control,and spatially adaptive adjustments,allowing for region-specific color correction.This approach effectively enhances color fidelity and perceptual naturalness,addressing the limitations of global correction techniques.For contrast enhancement,the proposed method leverages the global mapping characteristics of the Rayleigh distribution to improve overall contrast,while CLAHE is employed to adaptively enhance local regions.A weighted fusion strategy is then applied to synthesize high-quality underwater images.Experimental results indicate that LACC-RCE surpasses conventional methods in color restoration,contrast optimization,and detail preservation,thereby enhancing the visual quality of underwater images.This improvement facilitates more reliable inputs for underwater object detection and recognition tasks.展开更多
文摘Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.
基金funded by the National Natural Science Foundation of China,grant numbers 52374156 and 62476005。
文摘Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.
基金supported in part by the National Natural Science Foundation of China[Grant number 62471075]the Major Science and Technology Project Grant of the Chongqing Municipal Education Commission[Grant number KJZD-M202301901].
文摘Low-light image enhancement aims to improve the visibility of severely degraded images captured under insufficient illumination,alleviating the adverse effects of illumination degradation on image quality.Traditional Retinex-based approaches,inspired by human visual perception of brightness and color,decompose an image into illumination and reflectance components to restore fine details.However,their limited capacity for handling noise and complex lighting conditions often leads to distortions and artifacts in the enhanced results,particularly under extreme low-light scenarios.Although deep learning methods built upon Retinex theory have recently advanced the field,most still suffer frominsufficient interpretability and sub-optimal enhancement performance.This paper presents RetinexWT,a novel framework that tightly integrates classical Retinex theory with modern deep learning.Following Retinex principles,RetinexWT employs wavelet transforms to estimate illumination maps for brightness adjustment.A detail-recovery module that synergistically combines Vision Transformer(ViT)and wavelet transforms is then introduced to guide the restoration of lost details,thereby improving overall image quality.Within the framework,wavelet decomposition splits input features into high-frequency and low-frequency components,enabling scale-specific processing of global illumination/color cues and fine textures.Furthermore,a gating mechanism selectively fuses down-sampled and up-sampled features,while an attention-based fusion strategy enhances model interpretability.Extensive experiments on the LOL dataset demonstrate that RetinexWT surpasses existing Retinex-oriented deeplearning methods,achieving an average Peak Signal-to-Noise Ratio(PSNR)improvement of 0.22 dB over the current StateOfTheArt(SOTA),thereby confirming its superiority in low-light image enhancement.Code is available at https://github.com/CHEN-hJ516/RetinexWT(accessed on 14 October 2025).
基金supported in part by the National Natural Science Foundation of China[Grant number 62471075]the Major Science and Technology Project Grant of the Chongqing Municipal Education Commission[Grant number KJZD-M202301901].
文摘Underwater images often affect the effectiveness of underwater visual tasks due to problems such as light scattering,color distortion,and detail blurring,limiting their application performance.Existing underwater image enhancement methods,although they can improve the image quality to some extent,often lead to problems such as detail loss and edge blurring.To address these problems,we propose FENet,an efficient underwater image enhancement method.FENet first obtains three different scales of images by image downsampling and then transforms them into the frequency domain to extract the low-frequency and high-frequency spectra,respectively.Then,a distance mask and a mean mask are constructed based on the distance and magnitude mean for enhancing the high-frequency part,thus improving the image details and enhancing the effect by suppressing the noise in the low-frequency part.Affected by the light scattering of underwater images and the fact that some details are lost if they are directly reduced to the spatial domain after the frequency domain operation.For this reason,we propose a multi-stage residual feature aggregation module,which focuses on detail extraction and effectively avoids information loss caused by global enhancement.Finally,we combine the edge guidance strategy to further enhance the edge details of the image.Experimental results indicate that FENet outperforms current state-of-the-art underwater image enhancement methods in quantitative and qualitative evaluations on multiple publicly available datasets.
基金supported by the Second Batch of Key Textbook Construction Projects of“14th Five-Year Plan”of Zhejiang Vocational Colleges(SZDJC-2412).
文摘Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspection,which requires extensive experience,suffers from low efficiency,and is highly subjective.As the results are presented as radar images,image processing methods can be applied for fast and objective identification.Deep learning-based approaches now offer a robust solution for automated roadbed disease detection.This study proposes an enhanced Faster Region-based Convolutional Neural Networks(R-CNN)framework integrating ResNet-50 as the backbone and two-dimensional discrete Fourier spectrum transformation(2D-DFT)for frequency-domain feature fusion.A dedicated GPR image dataset comprising 1650 annotated images was constructed and augmented to 6600 images via median filtering,histogram equalization,and binarization.The proposed model segments defect regions,applies binary masking,and fuses frequency-domain features to improve small-target detection under noisy backgrounds.Experimental results show that the improved Faster R-CNN achieves a mean Average Precision(mAP)of 0.92,representing a 0.22 increase over the baseline.Precision improved by 26%while recall remained stable at 87%.The model was further validated on real urban road data,demonstrating robust detection capability even under interference.These findings highlight the potential of combining GPR with deep learning for efficient,non-destructive roadbed health monitoring.
文摘This study integrates explicit input enhancement into comparative continuation writing,defined as a task in which learners produce a continuation by comparing their own expression with an input text,aligning with its discourse structure and linguistic features,while developing their own ideas.It aims to examine whether English as a Foreign Language(EFL)learners in China exhibit differences in discourse competence and writing performance when completing comparative continuation writing combined with different input enhancement techniques,and whether the alignment effect occurs at the discourse level.Sixty first-year Chinese senior middle school students were divided into four groups:three groups engaged in comparative continuation writing with varying input enhancement,achieved by combining different techniques,while a control group performed a designated-topic writing task.The results revealed that three comparative continuation writing groups outperformed the designated-topic writing group in discourse competence,particularly in the use of temporal connectives.However,differences and some inconsistencies were observed among the comparative continuation writing groups across individual indices.The study highlights effective ways to incorporate comparative continuation writing into English instruction and demonstrates how explicit input enhancement can complement the task,simultaneously activating the alignment effect proposed by the xu-argument and enhancing discourse competence in writing.
基金financially sponsored by the National Natural Science Foundation of China(No.52204414)the National Energy-Saving and Low-Carbon Materials Production and Application Demonstration Platform Program,China(No.TC220H06N)+1 种基金the National Key R&D Program of China(No.2021YFC1910504)the Fundamental Research Funds for the Central Universities,China(No.FRFTP-20-097A1Z)。
文摘MnO_(x)-CeO_(2)catalysts for the low-temperature selective catalytic reduction(SCR)of NO remain vulnerable to water and sulfur poisoning,limting their practical applications.Herein,we report a hydrophobic-modified MnO_(x)-CeO_(2)catalyst that achieves enhanced NO conversion rate and stability under harsh conditions.The catalyst was synthesized by decorating MnOx crystals with amorphous CeO_(2),followed by loading hydrophobic silica on the external surfaces.The hydrophobic silica allowed the adsorption of NH_(3)and NO and diffusion of H,suppressed the adsorption of H_(2)O,and prevented SO_(2)interaction with the Mn active sites,achieving selective molecular discrimination at the catalyst surface.At 120℃,under H_(2)O and SO_(2)exposure,the optimal hydrophobic catalyst maintains 82%NO conversion rate compared with 69%for the unmodified catalyst.The average adsorption energies of NH_(3),H_(2)O,and SO_(2)decreased by 0.05,0.43,and 0.52 eV,respectively.The NO reduction pathway follows the Eley-Rideal mechanism,NH_(3)^(*)+*→NH_(2)^(*)+H^(*)followed by NH_(2)^(*)+NO^(*)→N_(2)^(*)+H_(2)O^(*),with NH_(3)dehydrogenation being the rate determining step.Hydrophobic modification increased the activation energy for H atom transfer,leading to a minor decrease in the NO conversion rate at 120℃.This work demonstrates a viable strategy for developing robust NH_(3)-S CR catalysts capable of efficient operation in water-and sulfur-rich environments.
基金supported by the Zhongyuan University of Technology Discipline Backbone Teacher Support Program Project(No.GG202417)the Key Research and Development Program of Henan under Grant 251111212000.
文摘Lateral movement represents the most covert and critical phase of Advanced Persistent Threats(APTs),and its detection still faces two primary challenges:sample scarcity and“cold start”of new entities.To address these challenges,we propose an Uncertainty-Driven Graph Embedding-Enhanced Lateral Movement Detection framework(UGEA-LMD).First,the framework employs event-level incremental encoding on a continuous-time graph to capture fine-grained behavioral evolution,enabling newly appearing nodes to retain temporal contextual awareness even in the absence of historical interactions and thereby fundamentally mitigating the cold-start problem.Second,in the embedding space,we model the dependency structure among feature dimensions using a Gaussian copula to quantify the uncertainty distribution,and generate augmented samples with consistent structural and semantic properties through adaptive sampling,thus expanding the representation space of sparse samples and enhancing the model’s generalization under sparse sample conditions.Unlike static graph methods that cannot model temporal dependencies or data augmentation techniques that depend on predefined structures,UGEA-LMD offers both superior temporaldynamic modeling and structural generalization.Experimental results on the large-scale LANL log dataset demonstrate that,under the transductive setting,UGEA-LMD achieves an AUC of 0.9254;even when 10%of nodes or edges are withheld during training,UGEA-LMD significantly outperforms baseline methods on metrics such as recall and AUC,confirming its robustness and generalization capability in sparse-sample and cold-start scenarios.
基金supported by the National Natural Science Foundation of China under Grant No.12204062the Natural Science Foundation of Shandong Province under Grant No.ZR2022MF330。
文摘To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.
基金supported by theHubei Provincial Technology Innovation Special Project and the Natural Science Foundation of Hubei Province under Grants 2023BEB024,2024AFC066,respectively.
文摘Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics.
文摘Recently,a multitude of techniques that fuse deep learning with Retinex theory have been utilized in the field of low-light image enhancement,yielding remarkable outcomes.Due to the intricate nature of imaging scenarios,including fluctuating noise levels and unpredictable environmental elements,these techniques do not fully resolve these challenges.We introduce an innovative strategy that builds upon Retinex theory and integrates a novel deep network architecture,merging the Convolutional Block Attention Module(CBAM)with the Transformer.Our model is capable of detecting more prominent features across both channel and spatial domains.We have conducted extensive experiments across several datasets,namely LOLv1,LOLv2-real,and LOLv2-sync.The results show that our approach surpasses other methods when evaluated against critical metrics such as Peak Signal-to-Noise Ratio(PSNR)and Structural Similarity Index(SSIM).Moreover,we have visually assessed images enhanced by various techniques and utilized visual metrics like LPIPS for comparison,and the experimental data clearly demonstrate that our approach excels visually over other methods as well.
基金supported by the National Natural Science Foundation of China(Nos.62201454 and 62306235)the Xi’an Science and Technology Program of Xi’an Science and Technology Bureau(No.23SFSF0004)。
文摘The unmanned aerial vehicle(UAV)images captured under low-light conditions are often suffering from noise and uneven illumination.To address these issues,we propose a low-light image enhancement algorithm for UAV images,which is inspired by the Retinex theory and guided by a light weighted map.Firstly,we propose a new network for reflectance component processing to suppress the noise in images.Secondly,we construct an illumination enhancement module that uses a light weighted map to guide the enhancement process.Finally,the processed reflectance and illumination components are recombined to obtain the enhancement results.Experimental results show that our method can suppress the noise in images while enhancing image brightness,and prevent over enhancement in bright regions.Code and data are available at https://gitee.com/baixiaotong2/uav-images.git.
基金supported by the Natural Science Foundation of Shandong Province(nos.ZR2023MF047,ZR2024MA055 and ZR2023QF139)the Enterprise Commissioned Project(nos.2024HX104 and 2024HX140)+1 种基金the China University Industry-University-Research Innovation Foundation(nos.2021ZYA11003 and 2021ITA05032)the Science and Technology Plan for Youth Innovation of Shandong's Universities(no.2019KJN012).
文摘In low-light environments,captured images often exhibit issues such as insufficient clarity and detail loss,which significantly degrade the accuracy of subsequent target recognition tasks.To tackle these challenges,this study presents a novel low-light image enhancement algorithm that leverages virtual hazy image generation through dehazing models based on statistical analysis.The proposed algorithm initiates the enhancement process by transforming the low-light image into a virtual hazy image,followed by image segmentation using a quadtree method.To improve the accuracy and robustness of atmospheric light estimation,the algorithm incorporates a genetic algorithm to optimize the quadtree-based estimation of atmospheric light regions.Additionally,this method employs an adaptive window adjustment mechanism to derive the dark channel prior image,which is subsequently refined using morphological operations and guided filtering.The final enhanced image is reconstructed through the hazy image degradation model.Extensive experimental evaluations across multiple datasets verify the superiority of the designed framework,achieving a peak signal-to-noise ratio(PSNR)of 17.09 and a structural similarity index(SSIM)of 0.74.These results indicate that the proposed algorithm not only effectively enhances image contrast and brightness but also outperforms traditional methods in terms of subjective and objective evaluation metrics.
文摘AIM:To find the effective contrast enhancement method on retinal images for effective segmentation of retinal features.METHODS:A novel image preprocessing method that used neighbourhood-based improved contrast limited adaptive histogram equalization(NICLAHE)to improve retinal image contrast was suggested to aid in the accurate identification of retinal disorders and improve the visibility of fine retinal structures.Additionally,a minimal-order filter was applied to effectively denoise the images without compromising important retinal structures.The novel NICLAHE algorithm was inspired by the classical CLAHE algorithm,but enhanced it by selecting the clip limits and tile sized in a dynamical manner relative to the pixel values in an image as opposed to using fixed values.It was evaluated on the Drive and high-resolution fundus(HRF)datasets on conventional quality measures.RESULTS:The new proposed preprocessing technique was applied to two retinal image databases,Drive and HRF,with four quality metrics being,root mean square error(RMSE),peak signal to noise ratio(PSNR),root mean square contrast(RMSC),and overall contrast.The technique performed superiorly on both the data sets as compared to the traditional enhancement methods.In order to assess the compatibility of the method with automated diagnosis,a deep learning framework named ResNet was applied in the segmentation of retinal blood vessels.Sensitivity,specificity,precision and accuracy were used to analyse the performance.NICLAHE–enhanced images outperformed the traditional techniques on both the datasets with improved accuracy.CONCLUSION:NICLAHE provides better results than traditional methods with less error and improved contrastrelated values.These enhanced images are subsequently measured by sensitivity,specificity,precision,and accuracy,which yield a better result in both datasets.
文摘The traditional FastSpeech2 has high generation efficiency and speech naturalness,but it still has limitations in metrical modeling,especially in the lack of effective linkage between semantics and metre.To enhance the performance of synthesized speech in terms of rhythmic expression,ProsodySpeech speech synthesis system that incorporates BERT pre-trained language model was proposed in this study.By introducing the Pre-trained Language Model Adapter(PLM Adapter)and the Semantic-Prosody Mapping Network(SPMN),and by fully utilizing the deep semantic information extracted by BERT,the system enhanced its control over rhythmic features such as pitch,energy,and duration.The proposed model achieved effective alignment and mapping between semantic information and prosody parameters by introducing a shared semantic processing layer,a global self-attention mechanism,and a specially designed prosody mapping branch.Experimental results showed that the model proposed in this study outperforms VITS and StyleTTS2 in terms of Mean Opinion Score(MOS),and the synthesized speech has a more obvious advantage in terms of rhythmic naturalness and expressive richness,which verified the effectiveness of the proposed model in enhancing the expression of speech rhythms,and the synthesized speech is closer to the expression of natural human speech.
基金Project supported by the National Key Research and Development Program of China(Grant No.2023YFA1407000)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB0460000)+4 种基金the National Natural Science Foundation of China(Grant Nos.12322401,12127807,and 12393832)CAS Key Research Program of Frontier Sciences(Grant No.ZDBS-LY-SLH004)Beijing Nova Program(Grant No.20230484301)Youth Innovation Promotion Association,Chinese Academy of Sciences(Grant No.2023125)CAS Project for Young Scientists in Basic Research(Grant No.YSBR-026)。
文摘Edge structures are ubiquitous in the processing and fabrication of various optoelectronic devices.Novel physical properties and enhanced light–matter interactions are anticipated to occur at crystal edges due to the broken spatial translational symmetry.However,the intensity of first-order Raman scattering at crystal edges has been rarely explored,although the mechanical stress and edge characteristics have been thoroughly studied by the Raman peak shift and the spectral features of the edge-related Raman modes.Here,by taking Ga As crystal with a well-defined edge as an example,we reveal the intensity enhancement of Raman-active modes and the emergence of Raman-forbidden modes under specific polarization configurations at the edge.This is attributed to the presence of a hot spot at the edge due to the redistributed electromagnetic fields and electromagnetic wave propagations of incident laser and Raman signal near the edge,which are confirmed by the finite-difference time-domain simulations.Spatially-resolved Raman intensities of both Raman-active and Raman-forbidden modes near the edge are calculated based on the redistributed electromagnetic fields,which quantitatively reproduce the corresponding experimental results.These findings offer new insights into the intensity enhancement of Raman scattering at crystal edges and present a new avenue to manipulate light–matter interactions of crystal by manufacturing various types of edges and to characterize the edge structures in photonic and optoelectronic devices.
文摘Low-light image enhancement is one of the most active research areas in the field of computer vision in recent years.In the low-light image enhancement process,loss of image details and increase in noise occur inevitably,influencing the quality of enhanced images.To alleviate this problem,a low-light image enhancement model called RetinexNet model based on Retinex theory was proposed in this study.The model was composed of an image decomposition module and a brightness enhancement module.In the decomposition module,a convolutional block attention module(CBAM)was incorporated to enhance feature representation capacity of the network,focusing on crucial features and suppressing irrelevant ones.A multifeature fusion denoising module was designed within the brightness enhancement module,circumventing the issue of feature loss during downsampling.The proposed model outperforms the existing algorithms in terms of PSNR and SSIM metrics on the publicly available datasets LOL and MIT-Adobe FiveK,as well as gives superior results in terms of NIQE metrics on the publicly available dataset LIME.
基金Graduate Student Innovation Projects of Beijing University of Civil Engineering and Architecture(No.PG2024121)。
文摘Underwater images are inherently degraded by color distortion,contrast reduction,and uneven brightness,primarily due to light absorption and scattering in water.To mitigate these challenges,a novel enhancement approach is proposed,integrating Local Adaptive Color Correction(LACC)with contrast enhancement based on adaptive Rayleigh distribution stretching and CLAHE(LACC-RCE).Conventional color correction methods predominantly employ global adjustment strategies,which are often inadequate for handling spatially varying color distortions.In contrast,the proposed LACC method incorporates local color analysis,tone-weighted control,and spatially adaptive adjustments,allowing for region-specific color correction.This approach effectively enhances color fidelity and perceptual naturalness,addressing the limitations of global correction techniques.For contrast enhancement,the proposed method leverages the global mapping characteristics of the Rayleigh distribution to improve overall contrast,while CLAHE is employed to adaptively enhance local regions.A weighted fusion strategy is then applied to synthesize high-quality underwater images.Experimental results indicate that LACC-RCE surpasses conventional methods in color restoration,contrast optimization,and detail preservation,thereby enhancing the visual quality of underwater images.This improvement facilitates more reliable inputs for underwater object detection and recognition tasks.