Image segmentation is vital when analyzing medical images,especially magnetic resonance(MR)images of the brain.Recently,several image segmentation techniques based on multilevel thresholding have been proposed for med...Image segmentation is vital when analyzing medical images,especially magnetic resonance(MR)images of the brain.Recently,several image segmentation techniques based on multilevel thresholding have been proposed for medical image segmentation;however,the algorithms become trapped in local minima and have low convergence speeds,particularly as the number of threshold levels increases.Consequently,in this paper,we develop a new multilevel thresholding image segmentation technique based on the jellyfish search algorithm(JSA)(an optimizer).We modify the JSA to prevent descents into local minima,and we accelerate convergence toward optimal solutions.The improvement is achieved by applying two novel strategies:Rankingbased updating and an adaptive method.Ranking-based updating is used to replace undesirable solutions with other solutions generated by a novel updating scheme that improves the qualities of the removed solutions.We develop a new adaptive strategy to exploit the ability of the JSA to find a best-so-far solution;we allow a small amount of exploration to avoid descents into local minima.The two strategies are integrated with the JSA to produce an improved JSA(IJSA)that optimally thresholds brain MR images.To compare the performances of the IJSA and JSA,seven brain MR images were segmented at threshold levels of 3,4,5,6,7,8,10,15,20,25,and 30.IJSA was compared with several other recent image segmentation algorithms,including the improved and standard marine predator algorithms,the modified salp and standard salp swarm algorithms,the equilibrium optimizer,and the standard JSA in terms of fitness,the Structured Similarity Index Metric(SSIM),the peak signal-to-noise ratio(PSNR),the standard deviation(SD),and the Features Similarity Index Metric(FSIM).The experimental outcomes and the Wilcoxon rank-sum test demonstrate the superiority of the proposed algorithm in terms of the FSIM,the PSNR,the objective values,and the SD;in terms of the SSIM,IJSA was competitive with the others.展开更多
Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of...Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.展开更多
Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bac...Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.展开更多
Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a c...Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a crucial topic of research.With advances in deep learning,researchers have developed numerous methods that combine Transformers and convolutional neural networks(CNNs)to create highly accurate models for medical image segmentation.However,efforts to further enhance accuracy by developing larger and more complex models or training with more extensive datasets,significantly increase computational resource consumption.To address this problem,we propose BiCLIP-nnFormer(the prefix"Bi"refers to the use of two distinct CLIP models),a virtual multimodal instrument that leverages CLIP models to enhance the segmentation performance of a medical segmentation model nnFormer.Since two CLIP models(PMC-CLIP and CoCa-CLIP)are pre-trained on large datasets,they do not require additional training,thus conserving computation resources.These models are used offline to extract image and text embeddings from medical images.These embeddings are then processed by the proposed 3D CLIP adapter,which adapts the CLIP knowledge for segmentation tasks by fine-tuning.Finally,the adapted embeddings are fused with feature maps extracted from the nnFormer encoder for generating predicted masks.This process enriches the representation capabilities of the feature maps by integrating global multimodal information,leading to more precise segmentation predictions.We demonstrate the superiority of BiCLIP-nnFormer and the effectiveness of using CLIP models to enhance nnFormer through experiments on two public datasets,namely the Synapse multi-organ segmentation dataset(Synapse)and the Automatic Cardiac Diagnosis Challenge dataset(ACDC),as well as a self-annotated lung multi-category segmentation dataset(LMCS).展开更多
Potential high-temperature risks exist in heat-prone components of electric moped charging devices,such as sockets,interfaces,and controllers.Traditional detection methods have limitations in terms of real-time perfor...Potential high-temperature risks exist in heat-prone components of electric moped charging devices,such as sockets,interfaces,and controllers.Traditional detection methods have limitations in terms of real-time performance and monitoring scope.To address this,a temperature detection method based on infrared image processing has been proposed:utilizing the median filtering algorithm to denoise the original infrared image,then applying an image segmentation algorithm to divide the image.展开更多
The visual noise of each light intensity area is different when the image is drawn by Monte Carlo method.However,the existing denoising algorithms have limited denoising performance under complex lighting conditions a...The visual noise of each light intensity area is different when the image is drawn by Monte Carlo method.However,the existing denoising algorithms have limited denoising performance under complex lighting conditions and are easy to lose detailed information.So we propose a rendered image denoising method with filtering guided by lighting information.First,we design an image segmentation algorithm based on lighting information to segment the image into different illumination areas.Then,we establish the parameter prediction model guided by lighting information for filtering(PGLF)to predict the filtering parameters of different illumination areas.For different illumination areas,we use these filtering parameters to construct area filters,and the filters are guided by the lighting information to perform sub-area filtering.Finally,the filtering results are fused with auxiliary features to output denoised images for improving the overall denoising effect of the image.Under the physically based rendering tool(PBRT)scene and Tungsten dataset,the experimental results show that compared with other guided filtering denoising methods,our method improves the peak signal-to-noise ratio(PSNR)metrics by 4.2164 dB on average and the structural similarity index(SSIM)metrics by 7.8%on average.This shows that our method can better reduce the noise in complex lighting scenesand improvethe imagequality.展开更多
Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in ord...Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global featu...Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.展开更多
Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)t...Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.展开更多
Lung cancer(LC)is a major cancer which accounts for higher mortality rates worldwide.Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages.Nowadays,machine learning(...Lung cancer(LC)is a major cancer which accounts for higher mortality rates worldwide.Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages.Nowadays,machine learning(ML)and deep learning(DL)methodologies are utilised for the robust detection and prediction of lung tumours.Recently,multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features.To cope with that,we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network(VSMI2−PANet)which adopts three imaging modalities named computed tomography(CT),magnetic resonance imaging(MRI)and single photon emission computed tomography(SPECT).The designed model accepts input from CT and MRI images and passes it to the VSMI2 module that is composed of three sub-modules named image cropping module,scale malleable convolution layer(SMCL)and PANet module.CT and MRI images are subjected to image cropping module in a parallel manner to crop the meaningful image patches and provide them to the SMCL module.The SMCL module is composed of adaptive convolutional layers that investigate those patches in a parallel manner by preserving the spatial information.The output from the SMCL is then fused and provided to the PANet module.The PANet module examines the fused patches by analysing its height,width and channels of the image patch.As a result,it provides an output as high-resolution spatial attention maps indicating the location of suspicious tumours.The high-resolution spatial attention maps are then provided as an input to the backbone module which uses light wave transformer(LWT)for segmenting the lung tumours into three classes,such as normal,benign and malignant.In addition,the LWT also accepts SPECT image as input for capturing the variations precisely to segment the lung tumours.The performance of the proposed model is validated using several performance metrics,such as accuracy,precision,recall,F1-score and AUC curve,and the results show that the proposed work outperforms the existing approaches.展开更多
Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert p...Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert performance.This survey reviews the principal model families as convolutional,recurrent,generative,reinforcement,autoencoder,and transfer-learning approaches as emphasising how their architectural choices map to tasks such as segmentation,classification,reconstruction,and anomaly detection.A dedicated treatment of multimodal fusion networks shows how imaging features can be integrated with genomic profiles and clinical records to yield more robust,context-aware predictions.To support clinical adoption,we outline post-hoc explainability techniques(Grad-CAM,SHAP,LIME)and describe emerging intrinsically interpretable designs that expose decision logic to end users.Regulatory guidance from the U.S.FDA,the European Medicines Agency,and the EU AI Act is summarised,linking transparency and lifecycle-monitoring requirements to concrete development practices.Remaining challenges as data imbalance,computational cost,privacy constraints,and cross-domain generalization are discussed alongside promising solutions such as federated learning,uncertainty quantification,and lightweight 3-D architectures.The article therefore offers researchers,clinicians,and policymakers a concise,practice-oriented roadmap for deploying trustworthy deep-learning systems in healthcare.展开更多
Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Im...Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.展开更多
Land use/land cover(LULC)change monitoring is critical for understanding environmental and socioeconomic processes and to identify patterns that may affect current and future land management.Forest cover evolution in ...Land use/land cover(LULC)change monitoring is critical for understanding environmental and socioeconomic processes and to identify patterns that may affect current and future land management.Forest cover evolution in the Mediterranean region has been studied to better understand forest succession,wildfires potential,and carbon stock assessment for climate change mitigation,among other reasons.However,though multiple sources of current LULC exist,data from last century's forest cover are less common,and are normally still reliant on locally orthophoto-interpreted data,making continuous maps of historical forest cover relatively uncommon.In this work,a pipeline based on image segmentation and random forest LULC modeling was developed to process three high resolution orthophotos(1956,1989,and 2021)into LULC continuous land cover maps of Spain's island of Ibiza.Next,they were combined to quantify forest evolution of Mediterranean Aleppo pine(Pinus halepensis Mill.)and to generate a continuous map of forest age classes.Our models were able to differentiate forestland with an accuracy higher than 80%in all cases,and were able to approximate forestland cover change since the mid-twentieth century,estimating 21,165±252 ha(37.0±0.4%)in 1956,27,099±472 ha(46.8±0.8%)in 1989,and 30,195±302 ha(52.8±0.5%)in 2021,with a mean increase of 139±6 ha(0.46±0.02%,calculated from current forest cover estimate)per year.The most important variables for the identification of the forestland were the terrain slope and the image gray level or color information in all orthophotos.When combining the information from the three periods,the analysis of forest evolution revealed that a significant portion of current forest cover,approximately 15,776 ha,fell within the 75-120 year age range,while 5388 ha fell within the range of 42-74 years,and 9022 ha within the 10-41 years forest age class.Younger forests,except when mapped after known wildfires,were not considered due to the limitations of the methodology.When compared to forest age data based on ground measurements,significant differences were found among each of the remotely sensed forest age classes,with a mean difference of 13 years between the theoretical age class central value and the real observed plot average age.Overall,63%of the forest inventory plots were assigned with the correct forest age class.This work will allow a better understanding of long-term Mediterranean forest dynamics and will help landowners and policymakers to better respond to new landscape planning challenges and achieve sustainable development goals.展开更多
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status...Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.展开更多
Agromyzid leafminers cause significant economic losses in both vegetable and horticultural crops,and precise assessments of pesticide needs must be based on the extent of leaf damage.Traditionally,surveyors estimate t...Agromyzid leafminers cause significant economic losses in both vegetable and horticultural crops,and precise assessments of pesticide needs must be based on the extent of leaf damage.Traditionally,surveyors estimate the damage by visually comparing the proportion of damaged to intact leaf area,a method that lacks objectivity,precision,and reliable data traceability.To address these issues,an advanced survey system that combines augmented reality(AR)glasses with a camera and an artificial intelligence(AI)algorithm was developed in this study to objectively and accurately assess leafminer damage in the feld.By wearing AR glasses equipped with a voice-controlled camera,surveyors can easily flatten damaged leaves by hand and capture images for analysis.This method can provide a precise and reliable diagnosis of leafminer damage levels,which in turn supports the implementation of scientifically grounded and targeted pest management strategies.To calculate the leafminer damage level,the DeepLab-Leafminer model was proposed to precisely segment the leafminer-damaged regions and the intact leaf region.The integration of an edge-aware module and a Canny loss function into the DeepLabv3+model enhanced the DeepLab-Leafminer model's capability to accurately segment the edges of leafminer-damaged regions,which often exhibit irregular shapes.Compared with state-of-the-art segmentation models,the DeepLabLeafminer model achieved superior segmentation performance with an Intersection over Union(IoU)of 81.23%and an F1score of 87.92%on leafminer-damaged leaves.The test results revealed a 92.38%diagnosis accuracy of leafminer damage levels based on the DeepLab-Leafminer model.A mobile application and a web platform were developed to assist surveyors in displaying the diagnostic results of leafminer damage levels.This system provides surveyors with an advanced,user-friendly,and accurate tool for assessing agromyzid leafminer damage in agricultural felds using wearable AR glasses and an AI model.This method can also be utilized to automatically diagnose pest and disease damage levels in other crops based on leaf images.展开更多
The traditional EnFCM(Enhanced fuzzy C-means)algorithm only considers the grey-scale features in image segmentation,resulting in less than satisfactory results when the algorithm is used for remote sensing woodland im...The traditional EnFCM(Enhanced fuzzy C-means)algorithm only considers the grey-scale features in image segmentation,resulting in less than satisfactory results when the algorithm is used for remote sensing woodland image segmentation and extraction.An EnFCM remote sensing forest land extraction method based on PCA multi-feature fusion was proposed.Firstly,histogram equalization was applied to improve the image contrast.Secondly,the texture and edge features of the image were extracted,and a multi-feature fused pixel image was generated using the PCA technique.Moreover,the fused feature was used as a feature constraint to measure the difference of pixels instead of a single grey-scale feature.Finally,an improved feature distance metric calculated the similarity between the pixel points and the cluster center to complete the cluster segmentation.The experimental results showed that the error was between 1.5%and 4.0%compared with the forested area counted by experts’hand-drawing,which could obtain a high accuracy segmentation and extraction result.展开更多
The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships ...The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships between pixels requires further improvement.Previous methods face challenges in efficiently managing multi-scale fea-tures of different granularities from the encoder backbone,leaving room for improvement in their global representation and feature extraction capabilities.To address these challenges,we propose a novel Decoder with Multi-Head Feature Receptors(DMHFR),which receives multi-scale features from the encoder backbone and organizes them into three feature groups with different granularities:coarse,fine-grained,and full set.These groups are subsequently processed by Multi-Head Feature Receptors(MHFRs)after feature capture and modeling operations.MHFRs include two Three-Head Feature Receptors(THFRs)and one Four-Head Feature Receptor(FHFR).Each group of features is passed through these MHFRs and then fed into axial transformers,which help the model capture long-range dependencies within the features.The three MHFRs produce three distinct feature outputs.The output from the FHFR serves as auxiliary auxiliary features in the prediction head,and the prediction output and their losses will eventually be aggregated.Experimental results show that the Transformer using DMHFR outperforms 15 state of the arts(SOTA)methods on five public datasets.Specifically,it achieved significant improvements in mean DICE scores over the classic Parallel Reverse Attention Network(PraNet)method,with gains of 4.1%,2.2%,1.4%,8.9%,and 16.3%on the CVC-ClinicDB,Kvasir-SEG,CVC-T,CVC-ColonDB,and ETIS-LaribPolypDB datasets,respectively.展开更多
Remote sensing image segmentation has a wide range of applications in land cover classification,urban building recognition,crop monitoring,and other fields.In recent years,with the booming development of deep learning...Remote sensing image segmentation has a wide range of applications in land cover classification,urban building recognition,crop monitoring,and other fields.In recent years,with the booming development of deep learning,remote sensing image segmentation models based on deep learning have gradually emerged and produced a large number of scientific research achievements.This article is based on deep learning and reviews the latest achievements in remote sensing image segmentation,exploring future development directions.Firstly,the basic concepts,characteristics,classification,tasks,and commonly used datasets of remote sensingimages are presented.Secondly,the segmentation models based on deep learning were classified and summarized,and the principles,characteristics,and applications of various models were presented.Then,the key technologies involved in deep learning remote sensing image segmentation were introduced.Finally,the future development direction and applicationprospects of remote sensing image segmentation were discussed.This article reviews the latest research achievements in remote sensing image segmentationfrom the perspective of deep learning,which can provide reference and inspiration for the research of remote sensing image segmentation.展开更多
Accurately characterizing the storage space of fractured-vuggy carbonate reservoirs is a major technical challenge in the efficient exploration and development of the petroleum industry.Electrical image logs are an ef...Accurately characterizing the storage space of fractured-vuggy carbonate reservoirs is a major technical challenge in the efficient exploration and development of the petroleum industry.Electrical image logs are an effective technique for identifying and evaluating dissolution vugs in carbonate reservoirs.However,due to limitations in the wellbore structure and the design of instruments,the images of electrical image logs often contain numerous blank strips,which affects the accuracy of subsequent vug processing and interpretation.To finely evaluate the pore structu re of karst reservoirs and quantitatively characterize reservoir parameters,this study proposes an automatic identification method for dissolution vugs in electrical image logs,integrating image inpainting and regional segme ntation based on an improved deep image prior(I DIP)framework.Firstly,the I DIP neural network model,leveraging its structural characteristics,uses a random mask and image data as input to iteratively learn low-level features at known pixel points and extend these features to blank areas of the image.This approach allows clear capture of the structure and texture information of vugs in blank strips,even in the absence of sufficient training samples.Subsequently,based on the inpainted images,the Otsu algorithm is used to determine the optimal global threshold,and then the watershed algorithm is applied to segment and label the vug targets,which addresses the problem of over-segmentation when separating the vug information from the stratigraphic background.Finally,the Freeman chain code is used to store and calculate vug parameters,converting the picked vug area into areal porosity to quantitatively assess the develo p ment degree of fractures and vugs in the reservoir.The results show a good correlation with core porosity and are superior to calculations without image inpainting.This study presents a method based on image processing for vug identification and evaluation of karst re servoirs,demonstrating high consistency with actual field data and providing theoretical support and methodological refe rence for the classification and evaluation of similar reservoirs.展开更多
基金This research was supported by the Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘Image segmentation is vital when analyzing medical images,especially magnetic resonance(MR)images of the brain.Recently,several image segmentation techniques based on multilevel thresholding have been proposed for medical image segmentation;however,the algorithms become trapped in local minima and have low convergence speeds,particularly as the number of threshold levels increases.Consequently,in this paper,we develop a new multilevel thresholding image segmentation technique based on the jellyfish search algorithm(JSA)(an optimizer).We modify the JSA to prevent descents into local minima,and we accelerate convergence toward optimal solutions.The improvement is achieved by applying two novel strategies:Rankingbased updating and an adaptive method.Ranking-based updating is used to replace undesirable solutions with other solutions generated by a novel updating scheme that improves the qualities of the removed solutions.We develop a new adaptive strategy to exploit the ability of the JSA to find a best-so-far solution;we allow a small amount of exploration to avoid descents into local minima.The two strategies are integrated with the JSA to produce an improved JSA(IJSA)that optimally thresholds brain MR images.To compare the performances of the IJSA and JSA,seven brain MR images were segmented at threshold levels of 3,4,5,6,7,8,10,15,20,25,and 30.IJSA was compared with several other recent image segmentation algorithms,including the improved and standard marine predator algorithms,the modified salp and standard salp swarm algorithms,the equilibrium optimizer,and the standard JSA in terms of fitness,the Structured Similarity Index Metric(SSIM),the peak signal-to-noise ratio(PSNR),the standard deviation(SD),and the Features Similarity Index Metric(FSIM).The experimental outcomes and the Wilcoxon rank-sum test demonstrate the superiority of the proposed algorithm in terms of the FSIM,the PSNR,the objective values,and the SD;in terms of the SSIM,IJSA was competitive with the others.
基金supported by the National Key R&D Program of China(No.2022YFC2504403)the National Natural Science Foundation of China(No.62172202)+1 种基金the Experiment Project of China Manned Space Program(No.HYZHXM01019)the Fundamental Research Funds for the Central Universities from Southeast University(No.3207032101C3)。
文摘Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.
基金financially supported by the Open Project Program of Wuhan National Laboratory for Optoelectronics(No.2022WNLOKF009)the National Natural Science Foundation of China(No.62475216)+2 种基金the Key Research and Development Program of Shaanxi(No.2024GH-ZDXM-37)the Fujian Provincial Natural Science Foundation of China(No.2024J01060)the Startup Program of XMU,and the Fundamental Research Funds for the Central Universities.
文摘Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.
基金funded by the National Natural Science Foundation of China(Grant No.6240072655)the Hubei Provincial Key Research and Development Program(Grant No.2023BCB151)+1 种基金the Wuhan Natural Science Foundation Exploration Program(Chenguang Program,Grant No.2024040801020202)the Natural Science Foundation of Hubei Province of China(Grant No.2025AFB148).
文摘Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a crucial topic of research.With advances in deep learning,researchers have developed numerous methods that combine Transformers and convolutional neural networks(CNNs)to create highly accurate models for medical image segmentation.However,efforts to further enhance accuracy by developing larger and more complex models or training with more extensive datasets,significantly increase computational resource consumption.To address this problem,we propose BiCLIP-nnFormer(the prefix"Bi"refers to the use of two distinct CLIP models),a virtual multimodal instrument that leverages CLIP models to enhance the segmentation performance of a medical segmentation model nnFormer.Since two CLIP models(PMC-CLIP and CoCa-CLIP)are pre-trained on large datasets,they do not require additional training,thus conserving computation resources.These models are used offline to extract image and text embeddings from medical images.These embeddings are then processed by the proposed 3D CLIP adapter,which adapts the CLIP knowledge for segmentation tasks by fine-tuning.Finally,the adapted embeddings are fused with feature maps extracted from the nnFormer encoder for generating predicted masks.This process enriches the representation capabilities of the feature maps by integrating global multimodal information,leading to more precise segmentation predictions.We demonstrate the superiority of BiCLIP-nnFormer and the effectiveness of using CLIP models to enhance nnFormer through experiments on two public datasets,namely the Synapse multi-organ segmentation dataset(Synapse)and the Automatic Cardiac Diagnosis Challenge dataset(ACDC),as well as a self-annotated lung multi-category segmentation dataset(LMCS).
基金supported by the National Key Research and Development Project of China(No.2023YFB3709605)the National Natural Science Foundation of China(No.62073193)the National College Student Innovation Training Program(No.202310422122)。
文摘Potential high-temperature risks exist in heat-prone components of electric moped charging devices,such as sockets,interfaces,and controllers.Traditional detection methods have limitations in terms of real-time performance and monitoring scope.To address this,a temperature detection method based on infrared image processing has been proposed:utilizing the median filtering algorithm to denoise the original infrared image,then applying an image segmentation algorithm to divide the image.
基金supported by the National Natural Science(No.U19A2063)the Jilin Provincial Development Program of Science and Technology (No.20230201080GX)the Jilin Province Education Department Scientific Research Project (No.JJKH20230851KJ)。
文摘The visual noise of each light intensity area is different when the image is drawn by Monte Carlo method.However,the existing denoising algorithms have limited denoising performance under complex lighting conditions and are easy to lose detailed information.So we propose a rendered image denoising method with filtering guided by lighting information.First,we design an image segmentation algorithm based on lighting information to segment the image into different illumination areas.Then,we establish the parameter prediction model guided by lighting information for filtering(PGLF)to predict the filtering parameters of different illumination areas.For different illumination areas,we use these filtering parameters to construct area filters,and the filters are guided by the lighting information to perform sub-area filtering.Finally,the filtering results are fused with auxiliary features to output denoised images for improving the overall denoising effect of the image.Under the physically based rendering tool(PBRT)scene and Tungsten dataset,the experimental results show that compared with other guided filtering denoising methods,our method improves the peak signal-to-noise ratio(PSNR)metrics by 4.2164 dB on average and the structural similarity index(SSIM)metrics by 7.8%on average.This shows that our method can better reduce the noise in complex lighting scenesand improvethe imagequality.
基金Natural Science Foundation of Zhejiang Province,Grant/Award Number:LY23F020025Science and Technology Commissioner Program of Huzhou,Grant/Award Number:2023GZ42Sichuan Provincial Science and Technology Support Program,Grant/Award Numbers:2023ZHCG0005,2023ZHCG0008。
文摘Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金National Key Research and Development Program of China,Grant/Award Number:2018YFE0206900China Postdoctoral Science Foundation,Grant/Award Number:2023M731204+2 种基金The Open Project of Key Laboratory for Quality Evaluation of Ultrasound Surgical Equipment of National Medical Products Administration,Grant/Award Number:SMDTKL-2023-1-01The Hubei Province Key Research and Development Project,Grant/Award Number:2023BCB007CAAI-Huawei MindSpore Open Fund。
文摘Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.
基金supported by the Natural Science Foundation of China(No.41804112,author:Chengyun Song).
文摘Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.
基金supported by the VTT Technical Research Centre of Finland and the work of Nayef Alqahtani is supported by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia(Grant KFU251882).
文摘Lung cancer(LC)is a major cancer which accounts for higher mortality rates worldwide.Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages.Nowadays,machine learning(ML)and deep learning(DL)methodologies are utilised for the robust detection and prediction of lung tumours.Recently,multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features.To cope with that,we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network(VSMI2−PANet)which adopts three imaging modalities named computed tomography(CT),magnetic resonance imaging(MRI)and single photon emission computed tomography(SPECT).The designed model accepts input from CT and MRI images and passes it to the VSMI2 module that is composed of three sub-modules named image cropping module,scale malleable convolution layer(SMCL)and PANet module.CT and MRI images are subjected to image cropping module in a parallel manner to crop the meaningful image patches and provide them to the SMCL module.The SMCL module is composed of adaptive convolutional layers that investigate those patches in a parallel manner by preserving the spatial information.The output from the SMCL is then fused and provided to the PANet module.The PANet module examines the fused patches by analysing its height,width and channels of the image patch.As a result,it provides an output as high-resolution spatial attention maps indicating the location of suspicious tumours.The high-resolution spatial attention maps are then provided as an input to the backbone module which uses light wave transformer(LWT)for segmenting the lung tumours into three classes,such as normal,benign and malignant.In addition,the LWT also accepts SPECT image as input for capturing the variations precisely to segment the lung tumours.The performance of the proposed model is validated using several performance metrics,such as accuracy,precision,recall,F1-score and AUC curve,and the results show that the proposed work outperforms the existing approaches.
基金supported by the Science Committee of the Ministry of Higher Education and Science of the Republic of Kazakhstan within the framework of grant AP23489899“Applying Deep Learning and Neuroimaging Methods for Brain Stroke Diagnosis”.
文摘Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert performance.This survey reviews the principal model families as convolutional,recurrent,generative,reinforcement,autoencoder,and transfer-learning approaches as emphasising how their architectural choices map to tasks such as segmentation,classification,reconstruction,and anomaly detection.A dedicated treatment of multimodal fusion networks shows how imaging features can be integrated with genomic profiles and clinical records to yield more robust,context-aware predictions.To support clinical adoption,we outline post-hoc explainability techniques(Grad-CAM,SHAP,LIME)and describe emerging intrinsically interpretable designs that expose decision logic to end users.Regulatory guidance from the U.S.FDA,the European Medicines Agency,and the EU AI Act is summarised,linking transparency and lifecycle-monitoring requirements to concrete development practices.Remaining challenges as data imbalance,computational cost,privacy constraints,and cross-domain generalization are discussed alongside promising solutions such as federated learning,uncertainty quantification,and lightweight 3-D architectures.The article therefore offers researchers,clinicians,and policymakers a concise,practice-oriented roadmap for deploying trustworthy deep-learning systems in healthcare.
文摘Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.
基金supported by the Doctoral INPh INIT Retaining Fellowship(LCF/BQ/DR22/11950023)from la Caixa Foundation(ID 100010434,Barcelona,Spain)with additional funding through grant PID2021127241OB I00 from the Spanish Ministry of Science and Innovation/Agen-cia Estatal de Investigación(MCIN/AEI/10.13039/501100011033)co funded by the European Regional Development Fund(ERDF)under the A Way of Making Europe。
文摘Land use/land cover(LULC)change monitoring is critical for understanding environmental and socioeconomic processes and to identify patterns that may affect current and future land management.Forest cover evolution in the Mediterranean region has been studied to better understand forest succession,wildfires potential,and carbon stock assessment for climate change mitigation,among other reasons.However,though multiple sources of current LULC exist,data from last century's forest cover are less common,and are normally still reliant on locally orthophoto-interpreted data,making continuous maps of historical forest cover relatively uncommon.In this work,a pipeline based on image segmentation and random forest LULC modeling was developed to process three high resolution orthophotos(1956,1989,and 2021)into LULC continuous land cover maps of Spain's island of Ibiza.Next,they were combined to quantify forest evolution of Mediterranean Aleppo pine(Pinus halepensis Mill.)and to generate a continuous map of forest age classes.Our models were able to differentiate forestland with an accuracy higher than 80%in all cases,and were able to approximate forestland cover change since the mid-twentieth century,estimating 21,165±252 ha(37.0±0.4%)in 1956,27,099±472 ha(46.8±0.8%)in 1989,and 30,195±302 ha(52.8±0.5%)in 2021,with a mean increase of 139±6 ha(0.46±0.02%,calculated from current forest cover estimate)per year.The most important variables for the identification of the forestland were the terrain slope and the image gray level or color information in all orthophotos.When combining the information from the three periods,the analysis of forest evolution revealed that a significant portion of current forest cover,approximately 15,776 ha,fell within the 75-120 year age range,while 5388 ha fell within the range of 42-74 years,and 9022 ha within the 10-41 years forest age class.Younger forests,except when mapped after known wildfires,were not considered due to the limitations of the methodology.When compared to forest age data based on ground measurements,significant differences were found among each of the remotely sensed forest age classes,with a mean difference of 13 years between the theoretical age class central value and the real observed plot average age.Overall,63%of the forest inventory plots were assigned with the correct forest age class.This work will allow a better understanding of long-term Mediterranean forest dynamics and will help landowners and policymakers to better respond to new landscape planning challenges and achieve sustainable development goals.
基金supported by the Deanship of Research and Graduate Studies at King Khalid University under Small Research Project grant number RGP1/139/45.
文摘Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.
基金supported by the National Key R&D Program of China(2021YFC2600400 and 2023YFC2605200)the National Key Research Program of China(2021YFD1401100)the“San Nong Jiu Fang”Sciences and Technologies Cooperation Project of Zhejiang Province,China(2024SNJF010)。
文摘Agromyzid leafminers cause significant economic losses in both vegetable and horticultural crops,and precise assessments of pesticide needs must be based on the extent of leaf damage.Traditionally,surveyors estimate the damage by visually comparing the proportion of damaged to intact leaf area,a method that lacks objectivity,precision,and reliable data traceability.To address these issues,an advanced survey system that combines augmented reality(AR)glasses with a camera and an artificial intelligence(AI)algorithm was developed in this study to objectively and accurately assess leafminer damage in the feld.By wearing AR glasses equipped with a voice-controlled camera,surveyors can easily flatten damaged leaves by hand and capture images for analysis.This method can provide a precise and reliable diagnosis of leafminer damage levels,which in turn supports the implementation of scientifically grounded and targeted pest management strategies.To calculate the leafminer damage level,the DeepLab-Leafminer model was proposed to precisely segment the leafminer-damaged regions and the intact leaf region.The integration of an edge-aware module and a Canny loss function into the DeepLabv3+model enhanced the DeepLab-Leafminer model's capability to accurately segment the edges of leafminer-damaged regions,which often exhibit irregular shapes.Compared with state-of-the-art segmentation models,the DeepLabLeafminer model achieved superior segmentation performance with an Intersection over Union(IoU)of 81.23%and an F1score of 87.92%on leafminer-damaged leaves.The test results revealed a 92.38%diagnosis accuracy of leafminer damage levels based on the DeepLab-Leafminer model.A mobile application and a web platform were developed to assist surveyors in displaying the diagnostic results of leafminer damage levels.This system provides surveyors with an advanced,user-friendly,and accurate tool for assessing agromyzid leafminer damage in agricultural felds using wearable AR glasses and an AI model.This method can also be utilized to automatically diagnose pest and disease damage levels in other crops based on leaf images.
基金supported by National Natural Science Foundation of China(No.61761027)Gansu Young Doctor’s Fund for Higher Education Institutions(No.2021QB-053)。
文摘The traditional EnFCM(Enhanced fuzzy C-means)algorithm only considers the grey-scale features in image segmentation,resulting in less than satisfactory results when the algorithm is used for remote sensing woodland image segmentation and extraction.An EnFCM remote sensing forest land extraction method based on PCA multi-feature fusion was proposed.Firstly,histogram equalization was applied to improve the image contrast.Secondly,the texture and edge features of the image were extracted,and a multi-feature fused pixel image was generated using the PCA technique.Moreover,the fused feature was used as a feature constraint to measure the difference of pixels instead of a single grey-scale feature.Finally,an improved feature distance metric calculated the similarity between the pixel points and the cluster center to complete the cluster segmentation.The experimental results showed that the error was between 1.5%and 4.0%compared with the forested area counted by experts’hand-drawing,which could obtain a high accuracy segmentation and extraction result.
基金supported by Xiamen Medical and Health Guidance Project in 2021(No.3502Z20214ZD1070)supported by a grant from Guangxi Key Laboratory of Machine Vision and Intelligent Control,China(No.2023B02).
文摘The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships between pixels requires further improvement.Previous methods face challenges in efficiently managing multi-scale fea-tures of different granularities from the encoder backbone,leaving room for improvement in their global representation and feature extraction capabilities.To address these challenges,we propose a novel Decoder with Multi-Head Feature Receptors(DMHFR),which receives multi-scale features from the encoder backbone and organizes them into three feature groups with different granularities:coarse,fine-grained,and full set.These groups are subsequently processed by Multi-Head Feature Receptors(MHFRs)after feature capture and modeling operations.MHFRs include two Three-Head Feature Receptors(THFRs)and one Four-Head Feature Receptor(FHFR).Each group of features is passed through these MHFRs and then fed into axial transformers,which help the model capture long-range dependencies within the features.The three MHFRs produce three distinct feature outputs.The output from the FHFR serves as auxiliary auxiliary features in the prediction head,and the prediction output and their losses will eventually be aggregated.Experimental results show that the Transformer using DMHFR outperforms 15 state of the arts(SOTA)methods on five public datasets.Specifically,it achieved significant improvements in mean DICE scores over the classic Parallel Reverse Attention Network(PraNet)method,with gains of 4.1%,2.2%,1.4%,8.9%,and 16.3%on the CVC-ClinicDB,Kvasir-SEG,CVC-T,CVC-ColonDB,and ETIS-LaribPolypDB datasets,respectively.
文摘Remote sensing image segmentation has a wide range of applications in land cover classification,urban building recognition,crop monitoring,and other fields.In recent years,with the booming development of deep learning,remote sensing image segmentation models based on deep learning have gradually emerged and produced a large number of scientific research achievements.This article is based on deep learning and reviews the latest achievements in remote sensing image segmentation,exploring future development directions.Firstly,the basic concepts,characteristics,classification,tasks,and commonly used datasets of remote sensingimages are presented.Secondly,the segmentation models based on deep learning were classified and summarized,and the principles,characteristics,and applications of various models were presented.Then,the key technologies involved in deep learning remote sensing image segmentation were introduced.Finally,the future development direction and applicationprospects of remote sensing image segmentation were discussed.This article reviews the latest research achievements in remote sensing image segmentationfrom the perspective of deep learning,which can provide reference and inspiration for the research of remote sensing image segmentation.
基金supported by the National Natural Science Foundation of China(Grant No.42272180)。
文摘Accurately characterizing the storage space of fractured-vuggy carbonate reservoirs is a major technical challenge in the efficient exploration and development of the petroleum industry.Electrical image logs are an effective technique for identifying and evaluating dissolution vugs in carbonate reservoirs.However,due to limitations in the wellbore structure and the design of instruments,the images of electrical image logs often contain numerous blank strips,which affects the accuracy of subsequent vug processing and interpretation.To finely evaluate the pore structu re of karst reservoirs and quantitatively characterize reservoir parameters,this study proposes an automatic identification method for dissolution vugs in electrical image logs,integrating image inpainting and regional segme ntation based on an improved deep image prior(I DIP)framework.Firstly,the I DIP neural network model,leveraging its structural characteristics,uses a random mask and image data as input to iteratively learn low-level features at known pixel points and extend these features to blank areas of the image.This approach allows clear capture of the structure and texture information of vugs in blank strips,even in the absence of sufficient training samples.Subsequently,based on the inpainted images,the Otsu algorithm is used to determine the optimal global threshold,and then the watershed algorithm is applied to segment and label the vug targets,which addresses the problem of over-segmentation when separating the vug information from the stratigraphic background.Finally,the Freeman chain code is used to store and calculate vug parameters,converting the picked vug area into areal porosity to quantitatively assess the develo p ment degree of fractures and vugs in the reservoir.The results show a good correlation with core porosity and are superior to calculations without image inpainting.This study presents a method based on image processing for vug identification and evaluation of karst re servoirs,demonstrating high consistency with actual field data and providing theoretical support and methodological refe rence for the classification and evaluation of similar reservoirs.