Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in ord...Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.展开更多
Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progr...Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progressive Layered U-Net(PLU-Net),designed to improve brain tumor segmentation accuracy from Magnetic Resonance Imaging(MRI)scans.The PLU-Net extends the standard U-Net architecture by incorporating progressive layering,attention mechanisms,and multi-scale data augmentation.The progressive layering involves a cascaded structure that refines segmentation masks across multiple stages,allowing the model to capture features at different scales and resolutions.Attention gates within the convolutional layers selectively focus on relevant features while suppressing irrelevant ones,enhancing the model's ability to delineate tumor boundaries.Additionally,multi-scale data augmentation techniques increase the diversity of training data and boost the model's generalization capabilities.Evaluated on the BraTS 2021 dataset,the PLU-Net achieved state-of-the-art performance with a dice coefficient of 0.91,specificity of 0.92,sensitivity of 0.89,Hausdorff95 of 2.5,outperforming other modified U-Net architectures in segmentation accuracy.These results underscore the effectiveness of the PLU-Net in improving brain tumor segmentation from MRI scans,supporting clinicians in early diagnosis,treatment planning,and the development of new therapies.展开更多
Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of...Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).展开更多
Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of...Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.展开更多
Reticular structures are the basis of major infrastructure projects,including bridges,electrical pylons and airports.However,inspecting and maintaining these structures is both expensive and hazardous,traditionally re...Reticular structures are the basis of major infrastructure projects,including bridges,electrical pylons and airports.However,inspecting and maintaining these structures is both expensive and hazardous,traditionally requiring human involvement.While some research has been conducted in this field of study,most efforts focus on faults identification through images or the design of robotic platforms,often neglecting the autonomous navigation of robots through the structure.This study addresses this limitation by proposing methods to detect navigable surfaces in truss structures,thereby enhancing the autonomous capabilities of climbing robots to navigate through these environments.The paper proposes multiple approaches for the binary segmentation between navigable surfaces and background from 3D point clouds captured from metallic trusses.Approaches can be classified into two paradigms:analytical algorithms and deep learning methods.Within the analytical approach,an ad hoc algorithm is developed for segmenting the structures,leveraging different techniques to evaluate the eigendecomposition of planar patches within the point cloud.In parallel,widely used and advanced deep learning models,including PointNet,PointNet++,MinkUNet34C,and PointTransformerV3,are trained and evaluated for the same task.A comparative analysis of these paradigms reveals some key insights.The analytical algorithm demonstrates easier parameter adjustment and comparable performance to that of the deep learning models,despite the latter’s higher computational demands.Nevertheless,the deep learning models stand out in segmentation accuracy,with PointTransformerV3 achieving impressive results,such as a Mean Intersection Over Union(mIoU)of approximately 97%.This study highlights the potential of analytical and deep learning approaches to improve the autonomous navigation of climbing robots in complex truss structures.The findings underscore the trade-offs between computational efficiency and segmentation performance,offering valuable insights for future research and practical applications in autonomous infrastructure maintenance and inspection.展开更多
Automatic surface defect detection is a critical technique for ensuring product quality in industrial casting production.While general object detection techniques have made remarkable progress over the past decade,cas...Automatic surface defect detection is a critical technique for ensuring product quality in industrial casting production.While general object detection techniques have made remarkable progress over the past decade,casting surface defect detection still has considerable room for improvement.Lack of sufficient and high-quality data has become one of the most challenging problems for casting surface defect detection.In this paper,we construct a new casting surface defect dataset(CSDD)containing 2100 high-resolution images of casting surface defects and 56356 defects in total.The class and defect region for each defect are manually labeled.We conduct a series of experiments on this dataset using multiple state-of-the-art object detection methods,establishing a comprehensive set of baselines.We also propose a defect detection method based on YOLOv5 with the global attention mechanism and partial convolution.Our proposed method achieves superior performance compared to other object detection methods.Additionally,we also conduct a series of experiments with multiple state-of-the-art semantic segmentation methods,providing extensive baselines for defect segmentation.To the best of our knowledge,the CSDD has the largest number of defects for casting surface defect detection and segmentation.It would benefit both the industrial vision research and manufacturing applications.Dataset and code are available at https://github.com/Kerio99/CSDD.展开更多
This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this resear...This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this research to build a system that enables image segmentation training with low-power machines.To accomplish this,all data are divided into several segments,each being trained separately.In the case of prediction,the initial output is predicted from each trained model for an input,where the ultimate output is selected based on the pixel-wise majority voting of the expected outputs,which also ensures data privacy.In addition,this kind of distributed training system allows different computers to be used simultaneously.That is how the training process takes comparatively less time than typical training approaches.Even after completing the training,the proposed prediction system allows a newly trained model to be included in the system.Thus,the prediction is consistently more accurate.We evaluated the effectiveness of the ultimate output based on four performance matrices:average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy.The experimental results show that the scores of average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy are 0.9216,0.0687,0.9477,and 0.8674,respectively.In addition,the proposed method was compared with four other state-of-the-art models in terms of total training time and usage of computational resources.And it outperformed all of them in these aspects.展开更多
Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while ob...Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.展开更多
Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from sei...Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.展开更多
Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this stud...Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.展开更多
AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigat...AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.展开更多
Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual infor...Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.展开更多
Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinct...Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.展开更多
Quantitative analysis of aluminum-silicon(Al-Si)alloy microstructure is crucial for evaluating and controlling alloy performance.Conventional analysis methods rely on manual segmentation,which is inefficient and subje...Quantitative analysis of aluminum-silicon(Al-Si)alloy microstructure is crucial for evaluating and controlling alloy performance.Conventional analysis methods rely on manual segmentation,which is inefficient and subjective,while fully supervised deep learning approaches require extensive and expensive pixel-level annotated data.Furthermore,existing semi-supervised methods still face challenges in handling the adhesion of adjacent primary silicon particles and effectively utilizing consistency in unlabeled data.To address these issues,this paper proposes a novel semi-supervised framework for Al-Si alloy microstructure image segmentation.First,we introduce a Rotational Uncertainty Correction Strategy(RUCS).This strategy employs multi-angle rotational perturbations andMonte Carlo sampling to assess prediction consistency,generating a pixel-wise confidence weight map.By integrating this map into the loss function,the model dynamically focuses on high-confidence regions,thereby improving generalization ability while reducing manual annotation pressure.Second,we design a Boundary EnhancementModule(BEM)to strengthen boundary feature extraction through erosion difference and multi-scale dilated convolutions.This module guides the model to focus on the boundary regions of adjacent particles,effectively resolving particle adhesion and improving segmentation accuracy.Systematic experiments were conducted on the Aluminum-Silicon Alloy Microstructure Dataset(ASAD).Results indicate that the proposed method performs exceptionally well with scarce labeled data.Specifically,using only 5%labeled data,our method improves the Jaccard index and Adjusted Rand Index(ARI)by 2.84 and 1.57 percentage points,respectively,and reduces the Variation of Information(VI)by 8.65 compared to stateof-the-art semi-supervised models,approaching the performance levels of 10%labeled data.These results demonstrate that the proposed method significantly enhances the accuracy and robustness of quantitative microstructure analysis while reducing annotation costs.展开更多
Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimo...Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.展开更多
Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and...Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and gridded data)and actual climate data(fixed-point observations near the sampling sites),in northeastern China’s warm temperate zone and analyzed differences in their correlations with tree-ring width index.The results were:(1)Gridded temperature data,as well as precipitation and relative humidity data from the Huailai meteorological station,was more consistent with the actual climate data;in contrast,gridded soil moisture content data showed significant discrepancies.(2)Horizontal distance had a greater impact on the representativeness of actual climate conditions than vertical elevation differences.(3)Differences in consistency between alternative and actual climate data also affected their correlations with tree-ring width indices.In some growing season months,correlation coefficients,both in magnitude and sign,differed significantly from those based on actual data.The selection of different alternative climate datasets can lead to biased results in assessing forest responses to climate change,which is detrimental to the management of forest ecosystems in harsh environments.Therefore,the scientific and rational selection of alternative climate data is essential for dendroecological and climatological research.展开更多
High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes an...High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.展开更多
Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bac...Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.展开更多
Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and stru...Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.展开更多
Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural n...Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural network(CNN)based methods,in which class activation mapping(CAM)is proposed to obtain the pseudo labels,and only concentrates on the most discriminative parts.Recently,transformer-based methods utilize attention map from the multi-headed self-attention(MHSA)module to predict pseudo labels,which usually contain obvious background noise and incoherent object area.To solve the above problems,we use the Conformer as our backbone,which is a parallel network based on convolutional neural network(CNN)and Transformer.The two branches generate pseudo labels and refine them independently,and can effectively combine the advantages of CNN and Transformer.However,the parallel structure is not close enough in the information communication.Thus,parallel structure can result in poor details about pseudo labels,and the background noise still exists.To alleviate this problem,we propose enhancing convolution CAM(ECCAM)model,which have three improved modules based on enhancing convolution,including deeper stem(DStem),convolutional feed-forward network(CFFN)and feature coupling unit with convolution(FCUConv).The ECCAM could make Conformer have tighter interaction between CNN and Transformer branches.After experimental verification,the improved modules we propose can help the network perceive more local information from images,making the final segmentation results more refined.Compared with similar architecture,our modules greatly improve the semantic segmentation performance and achieve70.2%mean intersection over union(mIoU)on the PASCAL VOC 2012 dataset.展开更多
基金Natural Science Foundation of Zhejiang Province,Grant/Award Number:LY23F020025Science and Technology Commissioner Program of Huzhou,Grant/Award Number:2023GZ42Sichuan Provincial Science and Technology Support Program,Grant/Award Numbers:2023ZHCG0005,2023ZHCG0008。
文摘Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.
文摘Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progressive Layered U-Net(PLU-Net),designed to improve brain tumor segmentation accuracy from Magnetic Resonance Imaging(MRI)scans.The PLU-Net extends the standard U-Net architecture by incorporating progressive layering,attention mechanisms,and multi-scale data augmentation.The progressive layering involves a cascaded structure that refines segmentation masks across multiple stages,allowing the model to capture features at different scales and resolutions.Attention gates within the convolutional layers selectively focus on relevant features while suppressing irrelevant ones,enhancing the model's ability to delineate tumor boundaries.Additionally,multi-scale data augmentation techniques increase the diversity of training data and boost the model's generalization capabilities.Evaluated on the BraTS 2021 dataset,the PLU-Net achieved state-of-the-art performance with a dice coefficient of 0.91,specificity of 0.92,sensitivity of 0.89,Hausdorff95 of 2.5,outperforming other modified U-Net architectures in segmentation accuracy.These results underscore the effectiveness of the PLU-Net in improving brain tumor segmentation from MRI scans,supporting clinicians in early diagnosis,treatment planning,and the development of new therapies.
基金supported by the National Natural Science Foundation of China(Grant Nos.62072250,61772281,61702235,U1636117,U1804263,62172435,61872203 and 61802212)the Zhongyuan Science and Technology Innovation Leading Talent Project of China(Grant No.214200510019)+3 种基金the Suqian Municipal Science and Technology Plan Project in 2020(S202015)the Plan for Scientific Talent of Henan Province(Grant No.2018JR0018)the Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology(Grant No.2020B1212060078)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)Fund.
文摘Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).
基金supported by the National Key R&D Program of China(No.2022YFC2504403)the National Natural Science Foundation of China(No.62172202)+1 种基金the Experiment Project of China Manned Space Program(No.HYZHXM01019)the Fundamental Research Funds for the Central Universities from Southeast University(No.3207032101C3)。
文摘Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.
基金funded by the spanish Ministry of Science,Innovation and Universities as part of the project PID2020-116418RB-I00 funded by MCIN/AEI/10.13039/501100011033.
文摘Reticular structures are the basis of major infrastructure projects,including bridges,electrical pylons and airports.However,inspecting and maintaining these structures is both expensive and hazardous,traditionally requiring human involvement.While some research has been conducted in this field of study,most efforts focus on faults identification through images or the design of robotic platforms,often neglecting the autonomous navigation of robots through the structure.This study addresses this limitation by proposing methods to detect navigable surfaces in truss structures,thereby enhancing the autonomous capabilities of climbing robots to navigate through these environments.The paper proposes multiple approaches for the binary segmentation between navigable surfaces and background from 3D point clouds captured from metallic trusses.Approaches can be classified into two paradigms:analytical algorithms and deep learning methods.Within the analytical approach,an ad hoc algorithm is developed for segmenting the structures,leveraging different techniques to evaluate the eigendecomposition of planar patches within the point cloud.In parallel,widely used and advanced deep learning models,including PointNet,PointNet++,MinkUNet34C,and PointTransformerV3,are trained and evaluated for the same task.A comparative analysis of these paradigms reveals some key insights.The analytical algorithm demonstrates easier parameter adjustment and comparable performance to that of the deep learning models,despite the latter’s higher computational demands.Nevertheless,the deep learning models stand out in segmentation accuracy,with PointTransformerV3 achieving impressive results,such as a Mean Intersection Over Union(mIoU)of approximately 97%.This study highlights the potential of analytical and deep learning approaches to improve the autonomous navigation of climbing robots in complex truss structures.The findings underscore the trade-offs between computational efficiency and segmentation performance,offering valuable insights for future research and practical applications in autonomous infrastructure maintenance and inspection.
基金supported by the National Natural Science Foundation of China(U23B2060,62088102)the Key Research and Development Program of China(2020AAA0108305).
文摘Automatic surface defect detection is a critical technique for ensuring product quality in industrial casting production.While general object detection techniques have made remarkable progress over the past decade,casting surface defect detection still has considerable room for improvement.Lack of sufficient and high-quality data has become one of the most challenging problems for casting surface defect detection.In this paper,we construct a new casting surface defect dataset(CSDD)containing 2100 high-resolution images of casting surface defects and 56356 defects in total.The class and defect region for each defect are manually labeled.We conduct a series of experiments on this dataset using multiple state-of-the-art object detection methods,establishing a comprehensive set of baselines.We also propose a defect detection method based on YOLOv5 with the global attention mechanism and partial convolution.Our proposed method achieves superior performance compared to other object detection methods.Additionally,we also conduct a series of experiments with multiple state-of-the-art semantic segmentation methods,providing extensive baselines for defect segmentation.To the best of our knowledge,the CSDD has the largest number of defects for casting surface defect detection and segmentation.It would benefit both the industrial vision research and manufacturing applications.Dataset and code are available at https://github.com/Kerio99/CSDD.
基金the Researchers Supporting Project,King Saud University,Saudi Arabia,for funding this research work through Project No.RSPD2025R951.
文摘This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this research to build a system that enables image segmentation training with low-power machines.To accomplish this,all data are divided into several segments,each being trained separately.In the case of prediction,the initial output is predicted from each trained model for an input,where the ultimate output is selected based on the pixel-wise majority voting of the expected outputs,which also ensures data privacy.In addition,this kind of distributed training system allows different computers to be used simultaneously.That is how the training process takes comparatively less time than typical training approaches.Even after completing the training,the proposed prediction system allows a newly trained model to be included in the system.Thus,the prediction is consistently more accurate.We evaluated the effectiveness of the ultimate output based on four performance matrices:average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy.The experimental results show that the scores of average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy are 0.9216,0.0687,0.9477,and 0.8674,respectively.In addition,the proposed method was compared with four other state-of-the-art models in terms of total training time and usage of computational resources.And it outperformed all of them in these aspects.
基金National Natural Science Foundation of China,Grant/Award Number:62303275International Alliance for Cancer Early Detection,Grant/Award Numbers:C28070/A30912,C73666/A31378Wellcome/EPSRC Centre for Interventional and Surgical Sciences,Grant/Award Number:203145Z/16/Z。
文摘Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.
文摘Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.
基金This study was conducted within the project FraxVir“Detection,characterisation and analyses of the occurrence of viruses and ash dieback in special stands of Fraxinus excelsior-a supplementary study to the FraxForFuture demonstration project”and receives funding via the Waldklimafonds(WKF)funded by the German Federal Ministry of Food and Agriculture(BMEL)and Federal Ministry for the Environment,Nature Conservation,Nuclear Safety and Consumer Protection(BMUV)administrated by the Agency for Renewable Resources(FNR)under grant agreement 2220WK40A4.
文摘Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.
基金Supported by the Shenzhen Science and Technology Program(No.JCYJ20240813152704006)the National Natural Science Foundation of China(No.62401259)+2 种基金the Fundamental Research Funds for the Central Universities(No.NZ2024036)the Postdoctoral Fellowship Program of CPSF(No.GZC20242228)High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics。
文摘AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.
文摘Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.
文摘Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.
基金funded by the National Natural Science Foundation of China (52061020).
文摘Quantitative analysis of aluminum-silicon(Al-Si)alloy microstructure is crucial for evaluating and controlling alloy performance.Conventional analysis methods rely on manual segmentation,which is inefficient and subjective,while fully supervised deep learning approaches require extensive and expensive pixel-level annotated data.Furthermore,existing semi-supervised methods still face challenges in handling the adhesion of adjacent primary silicon particles and effectively utilizing consistency in unlabeled data.To address these issues,this paper proposes a novel semi-supervised framework for Al-Si alloy microstructure image segmentation.First,we introduce a Rotational Uncertainty Correction Strategy(RUCS).This strategy employs multi-angle rotational perturbations andMonte Carlo sampling to assess prediction consistency,generating a pixel-wise confidence weight map.By integrating this map into the loss function,the model dynamically focuses on high-confidence regions,thereby improving generalization ability while reducing manual annotation pressure.Second,we design a Boundary EnhancementModule(BEM)to strengthen boundary feature extraction through erosion difference and multi-scale dilated convolutions.This module guides the model to focus on the boundary regions of adjacent particles,effectively resolving particle adhesion and improving segmentation accuracy.Systematic experiments were conducted on the Aluminum-Silicon Alloy Microstructure Dataset(ASAD).Results indicate that the proposed method performs exceptionally well with scarce labeled data.Specifically,using only 5%labeled data,our method improves the Jaccard index and Adjusted Rand Index(ARI)by 2.84 and 1.57 percentage points,respectively,and reduces the Variation of Information(VI)by 8.65 compared to stateof-the-art semi-supervised models,approaching the performance levels of 10%labeled data.These results demonstrate that the proposed method significantly enhances the accuracy and robustness of quantitative microstructure analysis while reducing annotation costs.
基金funded by the National Natural Science Foundation of China,grant number 62262045the Fundamental Research Funds for the Central Universities,grant number 2023CDJYGRH-YB11the Open Funding of SUGON Industrial Control and Security Center,grant number CUIT-SICSC-2025-03.
文摘Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.
基金supported by the International Partnership program of the Chinese Academy of Sciences(170GJHZ2023074GC)National Natural Science Foundation of China(42425706 and 42488201)+1 种基金National Key Research and Development Program of China(2024YFF0807902)Beijing Natural Science Foundation(8242041),and China Postdoctoral Science Foundation(2025M770353).
文摘Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and gridded data)and actual climate data(fixed-point observations near the sampling sites),in northeastern China’s warm temperate zone and analyzed differences in their correlations with tree-ring width index.The results were:(1)Gridded temperature data,as well as precipitation and relative humidity data from the Huailai meteorological station,was more consistent with the actual climate data;in contrast,gridded soil moisture content data showed significant discrepancies.(2)Horizontal distance had a greater impact on the representativeness of actual climate conditions than vertical elevation differences.(3)Differences in consistency between alternative and actual climate data also affected their correlations with tree-ring width indices.In some growing season months,correlation coefficients,both in magnitude and sign,differed significantly from those based on actual data.The selection of different alternative climate datasets can lead to biased results in assessing forest responses to climate change,which is detrimental to the management of forest ecosystems in harsh environments.Therefore,the scientific and rational selection of alternative climate data is essential for dendroecological and climatological research.
基金provided by the Science Research Project of Hebei Education Department under grant No.BJK2024115.
文摘High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.
基金financially supported by the Open Project Program of Wuhan National Laboratory for Optoelectronics(No.2022WNLOKF009)the National Natural Science Foundation of China(No.62475216)+2 种基金the Key Research and Development Program of Shaanxi(No.2024GH-ZDXM-37)the Fujian Provincial Natural Science Foundation of China(No.2024J01060)the Startup Program of XMU,and the Fundamental Research Funds for the Central Universities.
文摘Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.
文摘Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.
文摘Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural network(CNN)based methods,in which class activation mapping(CAM)is proposed to obtain the pseudo labels,and only concentrates on the most discriminative parts.Recently,transformer-based methods utilize attention map from the multi-headed self-attention(MHSA)module to predict pseudo labels,which usually contain obvious background noise and incoherent object area.To solve the above problems,we use the Conformer as our backbone,which is a parallel network based on convolutional neural network(CNN)and Transformer.The two branches generate pseudo labels and refine them independently,and can effectively combine the advantages of CNN and Transformer.However,the parallel structure is not close enough in the information communication.Thus,parallel structure can result in poor details about pseudo labels,and the background noise still exists.To alleviate this problem,we propose enhancing convolution CAM(ECCAM)model,which have three improved modules based on enhancing convolution,including deeper stem(DStem),convolutional feed-forward network(CFFN)and feature coupling unit with convolution(FCUConv).The ECCAM could make Conformer have tighter interaction between CNN and Transformer branches.After experimental verification,the improved modules we propose can help the network perceive more local information from images,making the final segmentation results more refined.Compared with similar architecture,our modules greatly improve the semantic segmentation performance and achieve70.2%mean intersection over union(mIoU)on the PASCAL VOC 2012 dataset.