Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual infor...Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.展开更多
Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes,where segmentati...Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes,where segmentation maps contain sparse and fragmented landslide regions under diverse geographical conditions.To address these issues,we propose a lightweight dual-stream siamese deep learning framework that integrates optical and topographical data fusion with an adaptive decoder,guided multimodal fusion,and deep supervision.The framework is built upon the synergistic combination of cross-attention,gated fusion,and sub-pixel upsampling within a unified dual-stream architecture specifically optimized for landslide segmentation,enabling efficient context modeling and robust feature exchange between modalities.The decoder captures long-range context at deeper levels using lightweight cross-attention and refines spatial details at shallower levels through attention-gated skip fusion,enabling precise boundary delineation and fewer false positives.The gated fusion further enhances multimodal integration of optical and topographical cues,and the deep supervision stabilizes training and improves generalization.Moreover,to mitigate checkerboard artifacts,a learnable sub-pixel upsampling is devised to replace the traditional transposed convolution.Despite its compact design with fewer parameters,the model consistently outperforms state-of-the-art baselines.Experiments on two benchmark datasets,Landslide4Sense and Bijie,confirm the effectiveness of the framework.On the Bijie dataset,it achieves an F1-score of 0.9110 and an intersection over union(IoU)of 0.8839.These results highlight its potential for accurate large-scale landslide inventory mapping and real-time disaster response.The implementation is publicly available at https://github.com/mishaown/DiGATe-UNet-LandSlide-Segmentation(accessed on 3 November 2025).展开更多
Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited t...Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited training data,imbalance data distribution,and inadequate feature extraction persist,hindering both the segmentation performance and optimal model generalization.Addressing these critical issues,the DEFFA-Unet is proposed featuring an additional encoder to process domain-invariant pre-processed inputs,thereby improving both richer feature encoding and enhanced model generalization.A feature filtering fusion module is developed to ensure the precise feature filtering and robust hybrid feature fusion.In response to the task-specific need for higher precision where false positives are very costly,traditional skip connections are replaced with the attention-guided feature reconstructing fusion module.Additionally,innovative data augmentation and balancing methods are proposed to counter data scarcity and distribution imbalance,further boosting the robustness and generalization of the model.With a comprehensive suite of evaluation metrics,extensive validations on four benchmark datasets(DRIVE,CHASEDB1,STARE,and HRF)and an SLO dataset(IOSTAR),demonstrate the proposed method’s superiority over both baseline and state-of-the-art models.Particularly the proposed method significantly outperforms the compared methods in cross-validation model generalization.展开更多
Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)t...Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.展开更多
With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object si...With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.展开更多
Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of...Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.展开更多
Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively ...Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively overcomes the segmentation problem[1],[2]of traditional models dealing with unknown categories[3].By deeply learning the deep semantic mapping between vision and language,the network significantly improves its ability to recognize unlabeled categories and exceeds current state-of-the-art methods in the task of scene understanding in open-vocabulary.展开更多
Real-time semantic segmentation tasks place stringent demands on network inference speed,often requiring a reduction in network depth to decrease computational load.However,shallow networks tend to exhibit degradation...Real-time semantic segmentation tasks place stringent demands on network inference speed,often requiring a reduction in network depth to decrease computational load.However,shallow networks tend to exhibit degradation in feature extraction completeness and inference accuracy.Therefore,balancing high performance with real-time requirements has become a critical issue in the study of real-time semantic segmentation.To address these challenges,this paper proposes a lightweight bilateral dual-residual network.By introducing a novel residual structure combined with feature extraction and fusion modules,the proposed network significantly enhances representational capacity while reducing computational costs.Specifically,an improved compound residual structure is designed to optimize the efficiency of information propagation and feature extraction.Furthermore,the proposed feature extraction and fusion module enables the network to better capture multi-scale information in images,improving the ability to detect both detailed and global semantic features.Experimental results on the publicly available Cityscapes dataset demonstrate that the proposed lightweight dual-branch network achieves outstanding performance while maintaining low computational complexity.In particular,the network achieved a mean Intersection over Union(mIoU)of 78.4%on the Cityscapes validation set,surpassing many existing semantic segmentation models.Additionally,in terms of inference speed,the network reached 74.5 frames per second when tested on an NVIDIA GeForce RTX 3090 GPU,significantly improving real-time performance.展开更多
During the operation, maintenance and upkeep of concrete buildings, surface cracks are often regarded as important warning signs of potential damage. Their precise segmentation plays a key role in assessing the health...During the operation, maintenance and upkeep of concrete buildings, surface cracks are often regarded as important warning signs of potential damage. Their precise segmentation plays a key role in assessing the health of a building. Traditional manual inspection is subjective, inefficient and has safety hazards. In contrast, current mainstream computer vision–based crack segmentation methods still suffer from missed detections, false detections, and segmentation discontinuities. These problems are particularly evident when dealing with small cracks, complex backgrounds, and blurred boundaries. For this reason, this paper proposes a lightweight building surface crack segmentation method, HL-YOLO, based on YOLOv11n-seg, which integrates an attention mechanism and a dilation-wise residual structure. First, we design a lightweight backbone network, RCSAA-Net, which combines ResNet50, capable of multi-scale feature extraction, with a custom Channel-Spatial Aggregation Attention (CSAA) module. This design boosts the model’s capacity to extract features of fine cracks and complex backgrounds. Among them, the CSAA module enhances the model’s attention to critical crack areas by capturing global dependencies in feature maps. Secondly, we construct an enhanced Content-aware ReAssembly of FEatures (ProCARAFE) module. It introduces a larger receptive field and dynamic kernel generation mechanism to achieve the reconstruction and accurate restoration of crack edge details. Finally, a Dilation-wise Residual (DWR) structure is introduced to reconstruct the C3k2 modules in the neck. It enhances multi-scale feature extraction and long-range contextual information fusion capabilities through multi-rate depthwise dilated convolutions. The improved model’s superiority and generalization ability have been validated through experiments on the self-built dataset. Compared to the baseline model, HL-YOLO improves mean Average Precision at 0.5 IoU by 4.1%, and increases the mean Intersection over Union (mIoU) by 4.86%, with only 3.12 million parameters. These results indicate that HL-YOLO can efficiently and accurately identify cracks on building surfaces, meeting the demand for rapid detection and providing an effective technical solution for real-time crack monitoring.展开更多
Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(...Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.展开更多
Accurate measurement of bean particle size is essential for automated grading and quality control in agricultural processing.However,existing image segmentation methods often suffer from low efficiency,over-segmentati...Accurate measurement of bean particle size is essential for automated grading and quality control in agricultural processing.However,existing image segmentation methods often suffer from low efficiency,over-segmentation,and high computational cost.We proposed a distancegradient dual constrained watershed algorithm for precise segmentation and measurement of bean particles.The method integrated distance transform-based seed extraction with gradient-constrained flooding,effectively suppressing noise-induced region fragmentation and improving the separation of adherent particles.An experimental platform was constructed using an industrial camera and an image-processing pipeline to evaluate performance.Compared with the conventional watershed algorithm,the proposed method improves segmentation accuracy by 7.2%and reduces the mean particle size error by 27.8%(0.13 mm,representing a relative error of 2.4%).Validation on three soybean varieties confirmed the robustness and generalizability of the approach.The results indicated that the proposed algorithm provided an efficient and accurate technique for agricultural particle size analysis,offering potential for integration into practical low-cost inspection systems.展开更多
[Background]High harmonic cavities are widely used in electron storage rings to lengthen thebunch,lower the bunch peak current,thereby reducing the IBS effect,enhancing the Touschek lifetime,as well asproviding Landau...[Background]High harmonic cavities are widely used in electron storage rings to lengthen thebunch,lower the bunch peak current,thereby reducing the IBS effect,enhancing the Touschek lifetime,as well asproviding Landau damping,which is particularly important for storage rings operating with ultra-low emittance or atlow beam energy.[Purpose]To further increase the bunch length without additional hardware costs,the phasemodulation in a dual-RF system is considered.[Methods]In this paper,turn-by-turn simulations incorporating randomsynchrotron radiation excitation are conducted,and a brief analysis is presented to explain the bunch lengtheningmechanism.[Results]Simulation results reveal that the peak current can be further reduced,thereby mitigating IBSeffects and enhancing the Touschek lifetime.Although the energy spread increases,which tends to reduce thebrightness of higher-harmonic radiation from the undulator,the brightness of the fundamental harmonic can,in fact,beimproved.展开更多
Due to the inability of manufacturing a single monolithic mirror at the 10-meter scales,segmented mirrors have become indispensable tools in modern astronomical research.However,to match the imaging performance of the...Due to the inability of manufacturing a single monolithic mirror at the 10-meter scales,segmented mirrors have become indispensable tools in modern astronomical research.However,to match the imaging performance of the monolithic counterpart,the sub-mirrors must maintain precise co-phasing.Piston error critically degrades segmented mirror imaging quality,necessitating efficient and precise detection.To ad-dress the limitations that the conventional circular-aperture diffraction with two-wavelength algorithm is sus-ceptible to decentration errors,and the traditional convolutional neural networks(CNNs)struggle to capture global features under large-range piston errors due to their restricted local receptive fields,this paper pro-poses a method that integrates extended Young’s interference principles with a Vision Transformer(ViT)to detect piston error.By suppressing decentration error interference through two symmetrically arranged aper-tures and extending the measurement range to±7.95μm via a two-wavelength(589 nm/600 nm)algorithm.This approach exploits ViT’s self-attention mechanism to model global characteristics of interference fringes.Unlike CNNs constrained by local convolutional kernels,the ViT significantly improves sensitivity to inter-ferogram periodicity.The simulation results demonstrate that the proposed method achieves a measurement accuracy of 5 nm(0.0083λ0)across the range of±7.95μm,while maintaining an accuracy exceeding 95%in the presence of Gaussian noise(SNR≥15 dB),Poisson noise(λ≥9 photons/pixel),and sub-mirror gap er-ror(Egap≤0.2)interference.Moreover,the detection speed shows significant improvement compared to the cross-correlation algorithm.This study establishes an accurate,robust framework for segmented mirror error detection,advancing high-precision astronomical observation.展开更多
Accurate and efficient brain tumor segmentation is essential for early diagnosis,treatment planning,and clinical decision-making.However,the complex structure of brain anatomy and the heterogeneous nature of tumors pr...Accurate and efficient brain tumor segmentation is essential for early diagnosis,treatment planning,and clinical decision-making.However,the complex structure of brain anatomy and the heterogeneous nature of tumors present significant challenges for precise anomaly detection.While U-Net-based architectures have demonstrated strong performance in medical image segmentation,there remains room for improvement in feature extraction and localization accuracy.In this study,we propose a novel hybrid model designed to enhance 3D brain tumor segmentation.The architecture incorporates a 3D ResNet encoder known for mitigating the vanishing gradient problem and a 3D U-Net decoder.Additionally,to enhance the model’s generalization ability,Squeeze and Excitation attention mechanism is integrated.We introduce Gabor filter banks into the encoder to further strengthen the model’s ability to extract robust and transformation-invariant features from the complex and irregular shapes typical in medical imaging.This approach,which is not well explored in current U-Net-based segmentation frameworks,provides a unique advantage by enhancing texture-aware feature representation.Specifically,Gabor filters help extract distinctive low-level texture features,reducing the effects of texture interference and facilitating faster convergence during the early stages of training.Our model achieved Dice scores of 0.881,0.846,and 0.819 for Whole Tumor(WT),Tumor Core(TC),and Enhancing Tumor(ET),respectively,on the BraTS 2020 dataset.Cross-validation on the BraTS 2021 dataset further confirmed the model’s robustness,yielding Dice score values of 0.887 for WT,0.856 for TC,and 0.824 for ET.The proposed model outperforms several state-of-the-art existing models,particularly in accurately identifying small and complex tumor regions.Extensive evaluations suggest integrating advanced preprocessing with an attention-augmented hybrid architecture offers significant potential for reliable and clinically valuable brain tumor segmentation.展开更多
Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this stud...Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.展开更多
AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigat...AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.展开更多
Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinct...Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.展开更多
Achieving simultaneous enhancement of crystallinity and optimal domain size remains a fundamental challenge in organic photovoltaics(OPVs),where conventional crystallization strategies often trigger excessive aggregat...Achieving simultaneous enhancement of crystallinity and optimal domain size remains a fundamental challenge in organic photovoltaics(OPVs),where conventional crystallization strategies often trigger excessive aggregation of small-molecule acceptors.This work pioneers a kinetic paradigm for resolving the crystallinity-domain size trade-off in organic photovoltaics through dual-additive-guided stepwise crystallization.By strategically pairing 1,2-dichlorobenzene(o-DCB,low binding energy to Y6)and 1-fluoronaphthalene(FN,high binding energy),we achieve temporally decoupled crystallization control:o-DCB first mediates donor-acceptor co-crystallization during film formation,constructing a metastable network,whereupon FN induces confined Y6 crystallization within this framework during thermal annealing,refining nanostructure without over-aggregation.Morphology studies reveal that this synergy enhances crystallinity of(100)diffraction peaks by 21%–10%versus single-additive controls(o-DCB/FN alone),while maintaining optimal domain size.These morphological advantages yield balanced carrier transport(μh/μe=1.23),near-unity exciton dissociation(98.53%),and a champion power conversion efficiency(PCE)of 18.08%for PM6:Y6,significantly surpassing single-additive devices(o-DCB:17.20%;FN:17.53%).Crucially,the dual-additive strategy demonstrates universal applicability across diverse active layer systems,achieving an outstanding PCE of 19.27%in PM6:L8-BO-based devices,thereby establishing a general framework for morphology control in high-efficiency OPVs.展开更多
Clock synchronization has important applications in multi-agent collaboration(such as drone light shows,intelligent transportation systems,and game AI),group decision-making,and emergency rescue operations.Synchroniza...Clock synchronization has important applications in multi-agent collaboration(such as drone light shows,intelligent transportation systems,and game AI),group decision-making,and emergency rescue operations.Synchronization method based on pulse-coupled oscillators(PCOs)provides an effective solution for clock synchronization in wireless networks.However,the existing clock synchronization algorithms in multi-agent ad hoc networks are difficult to meet the requirements of high precision and high stability of synchronization clock in group cooperation.Hence,this paper constructs a network model,named DAUNet(unsupervised neural network based on dual attention),to enhance clock synchronization accuracy in multi-agent wireless ad hoc networks.Specifically,we design an unsupervised distributed neural network framework as the backbone,building upon classical PCO-based synchronization methods.This framework resolves issues such as prolonged time synchronization message exchange between nodes,difficulties in centralized node coordination,and challenges in distributed training.Furthermore,we introduce a dual-attention mechanism as the core module of DAUNet.By integrating a Multi-Head Attention module and a Gated Attention module,the model significantly improves information extraction capabilities while reducing computational complexity,effectively mitigating synchronization inaccuracies and instability in multi-agent ad hoc networks.To evaluate the effectiveness of the proposed model,comparative experiments and ablation studies were conducted against classical methods and existing deep learning models.The research results show that,compared with the deep learning networks based on DASA and LSTM,DAUNet can reduce the mean normalized phase difference(NPD)by 1 to 2 orders of magnitude.Compared with the attention models based on additive attention and self-attention mechanisms,the performance of DAUNet has improved by more than ten times.This study demonstrates DAUNet’s potential in advancing multi-agent ad hoc networking technologies.展开更多
文摘Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.
基金funded by the National Natural Science Foundation of China,grant number 62262045the Fundamental Research Funds for the Central Universities,grant number 2023CDJYGRH-YB11the Open Funding of SUGON Industrial Control and Security Center,grant number CUIT-SICSC-2025-03.
文摘Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes,where segmentation maps contain sparse and fragmented landslide regions under diverse geographical conditions.To address these issues,we propose a lightweight dual-stream siamese deep learning framework that integrates optical and topographical data fusion with an adaptive decoder,guided multimodal fusion,and deep supervision.The framework is built upon the synergistic combination of cross-attention,gated fusion,and sub-pixel upsampling within a unified dual-stream architecture specifically optimized for landslide segmentation,enabling efficient context modeling and robust feature exchange between modalities.The decoder captures long-range context at deeper levels using lightweight cross-attention and refines spatial details at shallower levels through attention-gated skip fusion,enabling precise boundary delineation and fewer false positives.The gated fusion further enhances multimodal integration of optical and topographical cues,and the deep supervision stabilizes training and improves generalization.Moreover,to mitigate checkerboard artifacts,a learnable sub-pixel upsampling is devised to replace the traditional transposed convolution.Despite its compact design with fewer parameters,the model consistently outperforms state-of-the-art baselines.Experiments on two benchmark datasets,Landslide4Sense and Bijie,confirm the effectiveness of the framework.On the Bijie dataset,it achieves an F1-score of 0.9110 and an intersection over union(IoU)of 0.8839.These results highlight its potential for accurate large-scale landslide inventory mapping and real-time disaster response.The implementation is publicly available at https://github.com/mishaown/DiGATe-UNet-LandSlide-Segmentation(accessed on 3 November 2025).
文摘Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited training data,imbalance data distribution,and inadequate feature extraction persist,hindering both the segmentation performance and optimal model generalization.Addressing these critical issues,the DEFFA-Unet is proposed featuring an additional encoder to process domain-invariant pre-processed inputs,thereby improving both richer feature encoding and enhanced model generalization.A feature filtering fusion module is developed to ensure the precise feature filtering and robust hybrid feature fusion.In response to the task-specific need for higher precision where false positives are very costly,traditional skip connections are replaced with the attention-guided feature reconstructing fusion module.Additionally,innovative data augmentation and balancing methods are proposed to counter data scarcity and distribution imbalance,further boosting the robustness and generalization of the model.With a comprehensive suite of evaluation metrics,extensive validations on four benchmark datasets(DRIVE,CHASEDB1,STARE,and HRF)and an SLO dataset(IOSTAR),demonstrate the proposed method’s superiority over both baseline and state-of-the-art models.Particularly the proposed method significantly outperforms the compared methods in cross-validation model generalization.
基金supported by the Natural Science Foundation of China(No.41804112,author:Chengyun Song).
文摘Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.
基金funded by Zhejiang Basic Public Welfare Research Project,grant number LZY24E060001supported by Guangzhou Development Zone Science and Technology(2021GH10,2020GH10,2023GH02)+1 种基金the University of Macao(MYRG2022-00271-FST)the Science and Technology Development Fund(FDCT)of Macao(0032/2022/A).
文摘With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.
基金supported by the National Key R&D Program of China(No.2022YFC2504403)the National Natural Science Foundation of China(No.62172202)+1 种基金the Experiment Project of China Manned Space Program(No.HYZHXM01019)the Fundamental Research Funds for the Central Universities from Southeast University(No.3207032101C3)。
文摘Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.
基金supported by CAFUC(ZHMH 2022-005)Key Laboratory of Flight Techniques and Flight Safety(FZ2022ZZ06)Flight Technology and Flight Safety of Civil Aviation Administration of China(FZ2022KF10).
文摘Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively overcomes the segmentation problem[1],[2]of traditional models dealing with unknown categories[3].By deeply learning the deep semantic mapping between vision and language,the network significantly improves its ability to recognize unlabeled categories and exceeds current state-of-the-art methods in the task of scene understanding in open-vocabulary.
文摘Real-time semantic segmentation tasks place stringent demands on network inference speed,often requiring a reduction in network depth to decrease computational load.However,shallow networks tend to exhibit degradation in feature extraction completeness and inference accuracy.Therefore,balancing high performance with real-time requirements has become a critical issue in the study of real-time semantic segmentation.To address these challenges,this paper proposes a lightweight bilateral dual-residual network.By introducing a novel residual structure combined with feature extraction and fusion modules,the proposed network significantly enhances representational capacity while reducing computational costs.Specifically,an improved compound residual structure is designed to optimize the efficiency of information propagation and feature extraction.Furthermore,the proposed feature extraction and fusion module enables the network to better capture multi-scale information in images,improving the ability to detect both detailed and global semantic features.Experimental results on the publicly available Cityscapes dataset demonstrate that the proposed lightweight dual-branch network achieves outstanding performance while maintaining low computational complexity.In particular,the network achieved a mean Intersection over Union(mIoU)of 78.4%on the Cityscapes validation set,surpassing many existing semantic segmentation models.Additionally,in terms of inference speed,the network reached 74.5 frames per second when tested on an NVIDIA GeForce RTX 3090 GPU,significantly improving real-time performance.
基金support from Natural Science Foundation of Hunan Province(Grant No.2024JJ8055)Hunan Yiduoyun Commodity Itelligence Project(Grant No.h2024-003).
文摘During the operation, maintenance and upkeep of concrete buildings, surface cracks are often regarded as important warning signs of potential damage. Their precise segmentation plays a key role in assessing the health of a building. Traditional manual inspection is subjective, inefficient and has safety hazards. In contrast, current mainstream computer vision–based crack segmentation methods still suffer from missed detections, false detections, and segmentation discontinuities. These problems are particularly evident when dealing with small cracks, complex backgrounds, and blurred boundaries. For this reason, this paper proposes a lightweight building surface crack segmentation method, HL-YOLO, based on YOLOv11n-seg, which integrates an attention mechanism and a dilation-wise residual structure. First, we design a lightweight backbone network, RCSAA-Net, which combines ResNet50, capable of multi-scale feature extraction, with a custom Channel-Spatial Aggregation Attention (CSAA) module. This design boosts the model’s capacity to extract features of fine cracks and complex backgrounds. Among them, the CSAA module enhances the model’s attention to critical crack areas by capturing global dependencies in feature maps. Secondly, we construct an enhanced Content-aware ReAssembly of FEatures (ProCARAFE) module. It introduces a larger receptive field and dynamic kernel generation mechanism to achieve the reconstruction and accurate restoration of crack edge details. Finally, a Dilation-wise Residual (DWR) structure is introduced to reconstruct the C3k2 modules in the neck. It enhances multi-scale feature extraction and long-range contextual information fusion capabilities through multi-rate depthwise dilated convolutions. The improved model’s superiority and generalization ability have been validated through experiments on the self-built dataset. Compared to the baseline model, HL-YOLO improves mean Average Precision at 0.5 IoU by 4.1%, and increases the mean Intersection over Union (mIoU) by 4.86%, with only 3.12 million parameters. These results indicate that HL-YOLO can efficiently and accurately identify cracks on building surfaces, meeting the demand for rapid detection and providing an effective technical solution for real-time crack monitoring.
基金supported by the National Natural Science Foundation of China(Nos.81974355 and 82172524)Key Research and Development Program of Hubei Province(No.2021BEA161)+2 种基金National Innovation Platform Development Program(No.2020021105012440)Open Project Funding of the Hubei Key Laboratory of Big Data Intelligent Analysis and Application,Hubei University(No.2024BDIAA03)Free Innovation Preliminary Research Fund of Wuhan Union Hospital(No.2024XHYN047).
文摘Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.
基金supported by National Natural Science Foundation of China(No.62006092)University Synergy Innovation Program of Anhui Province(No.GXXT-2023-108)Excellent Youth Project of Natural Science Research in Anhui Province(No.2023AH030081).
文摘Accurate measurement of bean particle size is essential for automated grading and quality control in agricultural processing.However,existing image segmentation methods often suffer from low efficiency,over-segmentation,and high computational cost.We proposed a distancegradient dual constrained watershed algorithm for precise segmentation and measurement of bean particles.The method integrated distance transform-based seed extraction with gradient-constrained flooding,effectively suppressing noise-induced region fragmentation and improving the separation of adherent particles.An experimental platform was constructed using an industrial camera and an image-processing pipeline to evaluate performance.Compared with the conventional watershed algorithm,the proposed method improves segmentation accuracy by 7.2%and reduces the mean particle size error by 27.8%(0.13 mm,representing a relative error of 2.4%).Validation on three soybean varieties confirmed the robustness and generalizability of the approach.The results indicated that the proposed algorithm provided an efficient and accurate technique for agricultural particle size analysis,offering potential for integration into practical low-cost inspection systems.
基金National Natural Science Foundation of China(12405168)The Fundamental Research Funds for the Central Universities,China(2024CDJXY004)。
文摘[Background]High harmonic cavities are widely used in electron storage rings to lengthen thebunch,lower the bunch peak current,thereby reducing the IBS effect,enhancing the Touschek lifetime,as well asproviding Landau damping,which is particularly important for storage rings operating with ultra-low emittance or atlow beam energy.[Purpose]To further increase the bunch length without additional hardware costs,the phasemodulation in a dual-RF system is considered.[Methods]In this paper,turn-by-turn simulations incorporating randomsynchrotron radiation excitation are conducted,and a brief analysis is presented to explain the bunch lengtheningmechanism.[Results]Simulation results reveal that the peak current can be further reduced,thereby mitigating IBSeffects and enhancing the Touschek lifetime.Although the energy spread increases,which tends to reduce thebrightness of higher-harmonic radiation from the undulator,the brightness of the fundamental harmonic can,in fact,beimproved.
文摘Due to the inability of manufacturing a single monolithic mirror at the 10-meter scales,segmented mirrors have become indispensable tools in modern astronomical research.However,to match the imaging performance of the monolithic counterpart,the sub-mirrors must maintain precise co-phasing.Piston error critically degrades segmented mirror imaging quality,necessitating efficient and precise detection.To ad-dress the limitations that the conventional circular-aperture diffraction with two-wavelength algorithm is sus-ceptible to decentration errors,and the traditional convolutional neural networks(CNNs)struggle to capture global features under large-range piston errors due to their restricted local receptive fields,this paper pro-poses a method that integrates extended Young’s interference principles with a Vision Transformer(ViT)to detect piston error.By suppressing decentration error interference through two symmetrically arranged aper-tures and extending the measurement range to±7.95μm via a two-wavelength(589 nm/600 nm)algorithm.This approach exploits ViT’s self-attention mechanism to model global characteristics of interference fringes.Unlike CNNs constrained by local convolutional kernels,the ViT significantly improves sensitivity to inter-ferogram periodicity.The simulation results demonstrate that the proposed method achieves a measurement accuracy of 5 nm(0.0083λ0)across the range of±7.95μm,while maintaining an accuracy exceeding 95%in the presence of Gaussian noise(SNR≥15 dB),Poisson noise(λ≥9 photons/pixel),and sub-mirror gap er-ror(Egap≤0.2)interference.Moreover,the detection speed shows significant improvement compared to the cross-correlation algorithm.This study establishes an accurate,robust framework for segmented mirror error detection,advancing high-precision astronomical observation.
基金the National Science and Technology Council(NSTC)of the Republic of China,Taiwan,for financially supporting this research under Contract No.NSTC 112-2637-M-131-001.
文摘Accurate and efficient brain tumor segmentation is essential for early diagnosis,treatment planning,and clinical decision-making.However,the complex structure of brain anatomy and the heterogeneous nature of tumors present significant challenges for precise anomaly detection.While U-Net-based architectures have demonstrated strong performance in medical image segmentation,there remains room for improvement in feature extraction and localization accuracy.In this study,we propose a novel hybrid model designed to enhance 3D brain tumor segmentation.The architecture incorporates a 3D ResNet encoder known for mitigating the vanishing gradient problem and a 3D U-Net decoder.Additionally,to enhance the model’s generalization ability,Squeeze and Excitation attention mechanism is integrated.We introduce Gabor filter banks into the encoder to further strengthen the model’s ability to extract robust and transformation-invariant features from the complex and irregular shapes typical in medical imaging.This approach,which is not well explored in current U-Net-based segmentation frameworks,provides a unique advantage by enhancing texture-aware feature representation.Specifically,Gabor filters help extract distinctive low-level texture features,reducing the effects of texture interference and facilitating faster convergence during the early stages of training.Our model achieved Dice scores of 0.881,0.846,and 0.819 for Whole Tumor(WT),Tumor Core(TC),and Enhancing Tumor(ET),respectively,on the BraTS 2020 dataset.Cross-validation on the BraTS 2021 dataset further confirmed the model’s robustness,yielding Dice score values of 0.887 for WT,0.856 for TC,and 0.824 for ET.The proposed model outperforms several state-of-the-art existing models,particularly in accurately identifying small and complex tumor regions.Extensive evaluations suggest integrating advanced preprocessing with an attention-augmented hybrid architecture offers significant potential for reliable and clinically valuable brain tumor segmentation.
基金This study was conducted within the project FraxVir“Detection,characterisation and analyses of the occurrence of viruses and ash dieback in special stands of Fraxinus excelsior-a supplementary study to the FraxForFuture demonstration project”and receives funding via the Waldklimafonds(WKF)funded by the German Federal Ministry of Food and Agriculture(BMEL)and Federal Ministry for the Environment,Nature Conservation,Nuclear Safety and Consumer Protection(BMUV)administrated by the Agency for Renewable Resources(FNR)under grant agreement 2220WK40A4.
文摘Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.
基金Supported by the Shenzhen Science and Technology Program(No.JCYJ20240813152704006)the National Natural Science Foundation of China(No.62401259)+2 种基金the Fundamental Research Funds for the Central Universities(No.NZ2024036)the Postdoctoral Fellowship Program of CPSF(No.GZC20242228)High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics。
文摘AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.
文摘Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.
基金supported by the Shaanxi Provincial High level Talent Introduction Project(5113220044)the Shaanxi Outstanding Youth Project(2023-JC-JQ-33)+8 种基金the Youth Science and Technology Talent Promotion Project of Jiangsu Association for Science and Technology(TJ-2022-088)the Project funded by China Postdoctoral Science Foundation(2023TQ0273,2023TQ0274,2023M742833)the NationalNatural Science Foundation of China(62304181)the Natural Science Basic Research Program of Shaanxi(2023-JC-QN-0726,2025JC-YBQN-469)the GuangdongBasic and Applied Basic Research Foundation(2022A1515110286,2024A1515012538)the Basic Research Programs of Taicang(TC2024JC04)the Suzhou Science and Technology Development Plan Innovation Leading Talent Project(ZXL2023183)the Fundamental Research Funds for the Central Universities(G2022KY05108,G2024KY0605,G2023KY0601)and the Aeronautical Science Foundation of China(2018ZD53047).
文摘Achieving simultaneous enhancement of crystallinity and optimal domain size remains a fundamental challenge in organic photovoltaics(OPVs),where conventional crystallization strategies often trigger excessive aggregation of small-molecule acceptors.This work pioneers a kinetic paradigm for resolving the crystallinity-domain size trade-off in organic photovoltaics through dual-additive-guided stepwise crystallization.By strategically pairing 1,2-dichlorobenzene(o-DCB,low binding energy to Y6)and 1-fluoronaphthalene(FN,high binding energy),we achieve temporally decoupled crystallization control:o-DCB first mediates donor-acceptor co-crystallization during film formation,constructing a metastable network,whereupon FN induces confined Y6 crystallization within this framework during thermal annealing,refining nanostructure without over-aggregation.Morphology studies reveal that this synergy enhances crystallinity of(100)diffraction peaks by 21%–10%versus single-additive controls(o-DCB/FN alone),while maintaining optimal domain size.These morphological advantages yield balanced carrier transport(μh/μe=1.23),near-unity exciton dissociation(98.53%),and a champion power conversion efficiency(PCE)of 18.08%for PM6:Y6,significantly surpassing single-additive devices(o-DCB:17.20%;FN:17.53%).Crucially,the dual-additive strategy demonstrates universal applicability across diverse active layer systems,achieving an outstanding PCE of 19.27%in PM6:L8-BO-based devices,thereby establishing a general framework for morphology control in high-efficiency OPVs.
文摘Clock synchronization has important applications in multi-agent collaboration(such as drone light shows,intelligent transportation systems,and game AI),group decision-making,and emergency rescue operations.Synchronization method based on pulse-coupled oscillators(PCOs)provides an effective solution for clock synchronization in wireless networks.However,the existing clock synchronization algorithms in multi-agent ad hoc networks are difficult to meet the requirements of high precision and high stability of synchronization clock in group cooperation.Hence,this paper constructs a network model,named DAUNet(unsupervised neural network based on dual attention),to enhance clock synchronization accuracy in multi-agent wireless ad hoc networks.Specifically,we design an unsupervised distributed neural network framework as the backbone,building upon classical PCO-based synchronization methods.This framework resolves issues such as prolonged time synchronization message exchange between nodes,difficulties in centralized node coordination,and challenges in distributed training.Furthermore,we introduce a dual-attention mechanism as the core module of DAUNet.By integrating a Multi-Head Attention module and a Gated Attention module,the model significantly improves information extraction capabilities while reducing computational complexity,effectively mitigating synchronization inaccuracies and instability in multi-agent ad hoc networks.To evaluate the effectiveness of the proposed model,comparative experiments and ablation studies were conducted against classical methods and existing deep learning models.The research results show that,compared with the deep learning networks based on DASA and LSTM,DAUNet can reduce the mean normalized phase difference(NPD)by 1 to 2 orders of magnitude.Compared with the attention models based on additive attention and self-attention mechanisms,the performance of DAUNet has improved by more than ten times.This study demonstrates DAUNet’s potential in advancing multi-agent ad hoc networking technologies.