The instance segmentation of impacted teeth in the oral panoramic X-ray images is hotly researched.However,due to the complex structure,low contrast,and complex background of teeth in panoramic X-ray images,the task o...The instance segmentation of impacted teeth in the oral panoramic X-ray images is hotly researched.However,due to the complex structure,low contrast,and complex background of teeth in panoramic X-ray images,the task of instance segmentation is technically tricky.In this study,the contrast between impacted Teeth and periodontal tissues such as gingiva,periodontalmembrane,and alveolar bone is low,resulting in fuzzy boundaries of impacted teeth.Amodel based on Teeth YOLACT is proposed to provide amore efficient and accurate solution for the segmentation of impacted teeth in oral panoramic X-ray films.Firstly,a Multi-scale Res-Transformer Module(MRTM)is designed.In the module,depthwise separable convolutions with different receptive fields are used to enhance the sensitivity of the model to lesion size.Additionally,the Vision Transformer is integrated to improve the model’s ability to perceive global features.Secondly,the Context Interaction-awareness Module(CIaM)is designed to fuse deep and shallow features.The deep semantic features guide the shallow spatial features.Then,the shallow spatial features are embedded into the deep semantic features,and the cross-weighted attention mechanism is used to aggregate the deep and shallow features efficiently,and richer context information is obtained.Thirdly,the Edge-preserving perceptionModule(E2PM)is designed to enhance the teeth edge features.The first-order differential operator is used to get the tooth edge weight,and the perception ability of tooth edge features is improved.The shallow spatial feature is fused by linear mapping,weight concatenation,and matrix multiplication operations to preserve the tooth edge information.Finally,comparison experiments and ablation experiments are conducted on the oral panoramic X-ray image datasets.The results show that the APdet,APseg,ARdet,ARseg,mAPdet,and mAPseg indicators of the proposed model are 89.9%,91.9%,77.4%,77.6%,72.8%,and 73.5%,respectively.This study further verifies the application potential of the method combining multi-scale feature extraction,multi-scale feature fusion,and edge perception enhancement in medical image segmentation,which provides a valuable reference for future related research.展开更多
Instance segmentation is crucial in various domains,such as autonomous driving and robotics.However,there is scope for improvement in the detection speed of instance-segmentation algorithms for edge devices.Therefore,...Instance segmentation is crucial in various domains,such as autonomous driving and robotics.However,there is scope for improvement in the detection speed of instance-segmentation algorithms for edge devices.Therefore,it is essential to enhance detection speed while maintaining high accuracy.In this study,we propose you only look once-layer fusion(YOLO-LF),a lightweight instance segmentation method specifically designed to optimize the speed of instance segmentation for autonomous driving applications.Based on the You Only Look Once version 8 nano(YOLOv8n)framework,we introduce a lightweight convolutional module and design a lightweight layer aggrega-tion module called Reparameterization convolution and Partial convolution Efficient Layer Aggregation Networks(RPELAN).This module effectively reduces the impact of redundant information generated by traditional convolutional stacking on the network size and detection speed while enhancing the capability to process feature information.We experimentally verified that our generalized one-stage detection network lightweight method based on Grouped Spatial Convolution(GSconv)enhances the detection speed while maintaining accuracy across various state-of-the-art(SOTA)networks.Our experiments conducted on the publicly available Cityscapes dataset demonstrated that YOLO-LF maintained the same accuracy as yolov8n(mAP@0.537.9%),the model volume decreased by 14.3%from 3.259 to=2.804 M,and the Frames Per Second(FPS)increased by 14.48%from 57.47 to 65.79 compared with YOLOv8n,thereby demonstrating its potential for real-time instance segmentation on edge devices.展开更多
Tree trunk instance segmentation is crucial for under-canopy unmanned aerial vehicles(UAVs)to autonomously extract standing tree stem attributes.Using cameras as sensors makes these UAVs compact and lightweight,facili...Tree trunk instance segmentation is crucial for under-canopy unmanned aerial vehicles(UAVs)to autonomously extract standing tree stem attributes.Using cameras as sensors makes these UAVs compact and lightweight,facilitating safe and flexible navigation in dense forests.However,their limited onboard computational power makes real-time,image-based tree trunk segmentation challenging,emphasizing the urgent need for lightweight and efficient segmentation models.In this study,we present RT-Trunk,a model specifically designed for real-time tree trunk instance segmentation in complex forest environments.To ensure real-time performance,we selected SparseInst as the base framework.We incorporated ConvNeXt-T as the backbone to enhance feature extraction for tree trunks,thereby improving segmentation accuracy.We further integrate the lightweight convolutional block attention module(CBAM),enabling the model to focus on tree trunk features while suppressing irrelevant information,which leads to additional gains in segmentation accuracy.To enable RT-Trunk to operate effectively under diverse complex forest environments,we constructed a comprehensive dataset for training and testing by combining self-collected data with multiple public datasets covering different locations,seasons,weather conditions,tree species,and levels of forest clutter.Com-pared with the other tree trunk segmentation methods,the RT-Trunk method achieved an average precision of 91.4%and the fastest inference speed of 32.9 frames per second.Overall,the proposed RT-Trunk provides superior trunk segmentation performance that balances speed and accu-racy,making it a promising solution for supporting under-canopy UAVs in the autonomous extraction of standing tree stem attributes.The code for this work is available at https://github.com/NEFU CVRG/RT Trunk.展开更多
The real-time detection and instance segmentation of strawberries constitute fundamental components in the development of strawberry harvesting robots.Real-time identification of strawberries in an unstructured envi-r...The real-time detection and instance segmentation of strawberries constitute fundamental components in the development of strawberry harvesting robots.Real-time identification of strawberries in an unstructured envi-ronment is a challenging task.Current instance segmentation algorithms for strawberries suffer from issues such as poor real-time performance and low accuracy.To this end,the present study proposes an Efficient YOLACT(E-YOLACT)algorithm for strawberry detection and segmentation based on the YOLACT framework.The key enhancements of the E-YOLACT encompass the development of a lightweight attention mechanism,pyramid squeeze shuffle attention(PSSA),for efficient feature extraction.Additionally,an attention-guided context-feature pyramid network(AC-FPN)is employed instead of FPN to optimize the architecture’s performance.Furthermore,a feature-enhanced model(FEM)is introduced to enhance the prediction head’s capabilities,while efficient fast non-maximum suppression(EF-NMS)is devised to improve non-maximum suppression.The experimental results demonstrate that the E-YOLACT achieves a Box-mAP and Mask-mAP of 77.9 and 76.6,respectively,on the custom dataset.Moreover,it exhibits an impressive category accuracy of 93.5%.Notably,the E-YOLACT also demonstrates a remarkable real-time detection capability with a speed of 34.8 FPS.The method proposed in this article presents an efficient approach for the vision system of a strawberry-picking robot.展开更多
Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often...Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.展开更多
This study presents a single-class and multi-class instance segmentation approach applied to ancient Palmyrene inscriptions,employing two state-of-the-art deep learning algorithms,namely YOLOv8 and Roboflow 3.0.The go...This study presents a single-class and multi-class instance segmentation approach applied to ancient Palmyrene inscriptions,employing two state-of-the-art deep learning algorithms,namely YOLOv8 and Roboflow 3.0.The goal is to contribute to the preservation and understanding of historical texts,showcasing the potential of modern deep learning methods in archaeological research.Our research culminates in several key findings and scientific contributions.We comprehensively compare the performance of YOLOv8 and Roboflow 3.0 in the context of Palmyrene character segmentation—this comparative analysis mainly focuses on the strengths and weaknesses of each algorithm in this context.We also created and annotated an extensive dataset of Palmyrene inscriptions,a crucial resource for further research in the field.The dataset serves for training and evaluating the segmentation models.We employ comparative evaluation metrics to quantitatively assess the segmentation results,ensuring the reliability and reproducibility of our findings and we present custom visualization tools for predicted segmentation masks.Our study advances the state of the art in semi-automatic reading of Palmyrene inscriptions and establishes a benchmark for future research.The availability of the Palmyrene dataset and the insights into algorithm performance contribute to the broader understanding of historical text analysis.展开更多
The precise detection and segmentation of tumor lesions are very important for lung cancer computer-aided diagnosis.However,in PET/CT(Positron Emission Tomography/Computed Tomography)lung images,the lesion shapes are ...The precise detection and segmentation of tumor lesions are very important for lung cancer computer-aided diagnosis.However,in PET/CT(Positron Emission Tomography/Computed Tomography)lung images,the lesion shapes are complex,the edges are blurred,and the sample numbers are unbalanced.To solve these problems,this paper proposes a Multi-branch Cross-scale Interactive Feature fusion Transformer model(MCIF-Transformer Mask RCNN)for PET/CT lung tumor instance segmentation,The main innovative works of this paper are as follows:Firstly,the ResNet-Transformer backbone network is used to extract global feature and local feature in lung images.The pixel dependence relationship is established in local and non-local fields to improve the model perception ability.Secondly,the Cross-scale Interactive Feature Enhancement auxiliary network is designed to provide the shallow features to the deep features,and the cross-scale interactive feature enhancement module(CIFEM)is used to enhance the attention ability of the fine-grained features.Thirdly,the Cross-scale Interactive Feature fusion FPN network(CIF-FPN)is constructed to realize bidirectional interactive fusion between deep features and shallow features,and the low-level features are enhanced in deep semantic features.Finally,4 ablation experiments,3 comparison experiments of detection,3 comparison experiments of segmentation and 6 comparison experiments with two-stage and single-stage instance segmentation networks are done on PET/CT lung medical image datasets.The results showed that APdet,APseg,ARdet and ARseg indexes are improved by 5.5%,5.15%,3.11%and 6.79%compared with Mask RCNN(resnet50).Based on the above research,the precise detection and segmentation of the lesion region are realized in this paper.This method has positive significance for the detection of lung tumors.展开更多
Dynamic Simultaneous Localization and Mapping(SLAM)in visual scenes is currently a major research area in fields such as robot navigation and autonomous driving.However,in the face of complex real-world envi-ronments,...Dynamic Simultaneous Localization and Mapping(SLAM)in visual scenes is currently a major research area in fields such as robot navigation and autonomous driving.However,in the face of complex real-world envi-ronments,current dynamic SLAM systems struggle to achieve precise localization and map construction.With the advancement of deep learning,there has been increasing interest in the development of deep learning-based dynamic SLAM visual odometry in recent years,and more researchers are turning to deep learning techniques to address the challenges of dynamic SLAM.Compared to dynamic SLAM systems based on deep learning methods such as object detection and semantic segmentation,dynamic SLAM systems based on instance segmentation can not only detect dynamic objects in the scene but also distinguish different instances of the same type of object,thereby reducing the impact of dynamic objects on the SLAM system’s positioning.This article not only introduces traditional dynamic SLAM systems based on mathematical models but also provides a comprehensive analysis of existing instance segmentation algorithms and dynamic SLAM systems based on instance segmentation,comparing and summarizing their advantages and disadvantages.Through comparisons on datasets,it is found that instance segmentation-based methods have significant advantages in accuracy and robustness in dynamic environments.However,the real-time performance of instance segmentation algorithms hinders the widespread application of dynamic SLAM systems.In recent years,the rapid development of single-stage instance segmentationmethods has brought hope for the widespread application of dynamic SLAM systems based on instance segmentation.Finally,possible future research directions and improvementmeasures are discussed for reference by relevant professionals.展开更多
Instance segmentation plays an important role in image processing.The Deep Snake algorithm based on contour iteration deforms an initial bounding box to an instance contour end-to-end,which can improve the performance...Instance segmentation plays an important role in image processing.The Deep Snake algorithm based on contour iteration deforms an initial bounding box to an instance contour end-to-end,which can improve the performance of instance segmentation,but has defects such as slow segmentation speed and sub-optimal initial contour.To solve these problems,a real-time instance segmentation algorithm based on contour learning was proposed.Firstly,ShuffleNet V2 was used as backbone network,and the receptive field of the model was expanded by using a 5×5 convolution kernel.Secondly,a lightweight up-sampling module,multi-stage aggregation(MSA),performs residual fusion of multi-layer features,which not only improves segmentation speed,but also extracts effective features more comprehensively.Thirdly,a contour initialization method for network learning was designed,and a global contour feature aggregation mechanism was used to return a coarse contour,which solves the problem of excessive error between manually initialized contour and real contour.Finally,the Snake deformation module was used to iteratively optimize the coarse contour to obtain the final instance contour.The experimental results showed that the proposed method improved the instance segmentation accuracy on semantic boundaries dataset(SBD),Cityscapes and Kins datasets,and the average precision reached 55.8 on the SBD;Compared with Deep Snake,the model parameters were reduced by 87.2%,calculation amount was reduced by 78.3%,and segmentation speed reached 39.8 frame·s^(−1) when instance segmentation was performed on an image with a size of 512×512 pixels on a 2080Ti GPU.The proposed method can reduce resource consumption,realize instance segmentation tasks quickly and accurately,and therefore is more suitable for embedded platforms with limited resources.展开更多
Skin defect inspection is one of the most significant tasks in the conventional process of aircraft inspection.This paper proposes a vision-based method of pixel-level defect detection,which is based on the Mask Scori...Skin defect inspection is one of the most significant tasks in the conventional process of aircraft inspection.This paper proposes a vision-based method of pixel-level defect detection,which is based on the Mask Scoring R-CNN.First,an attention mechanism and a feature fusion module are introduced,to improve feature representation.Second,a new classifier head—consisting of four convolutional layers and a fully connected layer—is proposed,to reduce the influence of information around the area of the defect.Third,to evaluate the proposed method,a dataset of aircraft skin defects was constructed,containing 276 images with a resolution of 960×720 pixels.Experimental results show that the proposed classifier head improves the detection and segmentation accuracy,for aircraft skin defect inspection,more effectively than the attention mechanism and feature fusion module.Compared with the Mask R-CNN and Mask Scoring R-CNN,the proposed method increased the segmentation precision by approximately 21%and 19.59%,respectively.These results demonstrate that the proposed method performs favorably against the other two methods of pixellevel aircraft skin defect detection.展开更多
Mature soybean phenotyping is an important process in soybean breeding;however, the manual process is time-consuming and labor-intensive. Therefore, a novel approach that is rapid, accurate and highly precise is requi...Mature soybean phenotyping is an important process in soybean breeding;however, the manual process is time-consuming and labor-intensive. Therefore, a novel approach that is rapid, accurate and highly precise is required to obtain the phenotypic data of soybean stems, pods and seeds. In this research, we propose a mature soybean phenotype measurement algorithm called Soybean Phenotype Measure-instance Segmentation(SPM-IS). SPM-IS is based on a feature pyramid network, Principal Component Analysis(PCA) and instance segmentation. We also propose a new method that uses PCA to locate and measure the length and width of a target object via image instance segmentation. After 60,000 iterations, the maximum mean Average Precision(m AP) of the mask and box was able to reach 95.7%. The correlation coefficients R^(2) of the manual measurement and SPM-IS measurement of the pod length, pod width, stem length, complete main stem length, seed length and seed width were 0.9755, 0.9872, 0.9692, 0.9803,0.9656, and 0.9716, respectively. The correlation coefficients R^(2) of the manual counting and SPM-IS counting of pods, stems and seeds were 0.9733, 0.9872, and 0.9851, respectively. The above results show that SPM-IS is a robust measurement and counting algorithm that can reduce labor intensity, improve efficiency and speed up the soybean breeding process.展开更多
We introduce a novel method using a new generative model that automatically learns effective representations of the target and background appearance to detect,segment and track each instance in a video sequence.Differ...We introduce a novel method using a new generative model that automatically learns effective representations of the target and background appearance to detect,segment and track each instance in a video sequence.Differently from current discriminative tracking-by-detection solutions,our proposed hierarchical structural embedding learning can predict more highquality masks with accurate boundary details over spatio-temporal space via the normalizing flows.We formulate the instance inference procedure as a hierarchical spatio-temporal embedded learning across time and space.Given the video clip,our method first coarsely locates pixels belonging to a particular instance with Gaussian distribution and then builds a novel mixing distribution to promote the instance boundary by fusing hierarchical appearance embedding information in a coarse-to-fine manner.For the mixing distribution,we utilize a factorization condition normalized flow fashion to estimate the distribution parameters to improve the segmentation performance.Comprehensive qualitative,quantitative,and ablation experiments are performed on three representative video instance segmentation benchmarks(i.e.,YouTube-VIS19,YouTube-VIS21,and OVIS)and the effectiveness of the proposed method is demonstrated.More impressively,the superior performance of our model on an unsupervised video object segmentation dataset(i.e.,DAVIS19)proves its generalizability.Our algorithm implementations are publicly available at https://github.com/zyqin19/HEVis.展开更多
Autonomous driving technology has made a lot of outstanding achievements with deep learning,and the vehicle detection and classification algorithm has become one of the critical technologies of autonomous driving syst...Autonomous driving technology has made a lot of outstanding achievements with deep learning,and the vehicle detection and classification algorithm has become one of the critical technologies of autonomous driving systems.The vehicle instance segmentation can perform instance-level semantic parsing of vehicle information,which is more accurate and reliable than object detection.However,the existing instance segmentation algorithms still have the problems of poor mask prediction accuracy and low detection speed.Therefore,this paper proposes an advanced real-time instance segmentation model named FIR-YOLACT,which fuses the ICIoU(Improved Complete Intersection over Union)and Res2Net for the YOLACT algorithm.Specifically,the ICIoU function can effectively solve the degradation problem of the original CIoU loss function,and improve the training convergence speed and detection accuracy.The Res2Net module fused with the ECA(Efficient Channel Attention)Net is added to the model’s backbone network,which improves the multi-scale detection capability and mask prediction accuracy.Furthermore,the Cluster NMS(Non-Maximum Suppression)algorithm is introduced in the model’s bounding box regression to enhance the performance of detecting similarly occluded objects.The experimental results demonstrate the superiority of FIR-YOLACT to the based methods and the effectiveness of all components.The processing speed reaches 28 FPS,which meets the demands of real-time vehicle instance segmentation.展开更多
3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encou...3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis.The computer vision,graphics,and machine learning fields have all given it a lot of attention.Traditionally,3D segmentation was done with hand-crafted features and designed approaches that didn’t achieve acceptable performance and couldn’t be generalized to large-scale data.Deep learning approaches have lately become the preferred method for 3D segmentation challenges by their great success in 2D computer vision.However,the task of instance segmentation is currently less explored.In this paper,we propose a novel approach for efficient 3D instance segmentation using red green blue and depth(RGB-D)data based on deep learning.The 2D region based convolutional neural networks(Mask R-CNN)deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.In order to generate 3D point cloud coordinates(x,y,z),segmented 2D pixels(u,v)of recognized object regions in the RGB image are merged into(u,v)points of the depth image.Moreover,we conducted an experiment and analysis to compare our proposed method from various points of view and distances.The experimentation shows the proposed 3D object recognition and instance segmentation are sufficiently beneficial to support object handling in robotic and intelligent systems.展开更多
Different objects in Chinese paintings contain rich cultural connotations. Segmenting and extracting different objects in Chinese paintings through technical methods is an effective way to enhance cultural added value...Different objects in Chinese paintings contain rich cultural connotations. Segmenting and extracting different objects in Chinese paintings through technical methods is an effective way to enhance cultural added value and activate cultural resources.Although the existing deep learning methods can extract multi-level features for instance segmentation, the location relationship features of instances are not fully utilized, resulting in poor segmentation results for the traditional Chinese painting(TCP) instance segmentation. In this paper, a novel TCP image instance segmentation algorithm based on the integration of spatial structure characteristics(SSC) was proposed, and is called SSC-Net. Firstly, considering the characteristics of TCP images, such as the gradual color blending and discontinuous contour lines, an instance information entropy composed of color entropy, formed by regional variance, and contour entropy, formed by contour point regression is proposed. Then, aiming at the problem that the existing network structure is difficult to fully consider the location relationship features of instances in TCP images, based on the residual neural network(ResNet) structure, a Chinese painting instance segmentation network framework composed of mask branch and position branch that can integrate spatial structure features is proposed. Finally, the color entropy and contour entropy are input into the mask branch and position branch of the SSC-Net structure respectively, so as to realize the instance segmentation of TCP. The quantitative and qualitative experiments on the challenging TCP database show that, compared with the state-of-the-art algorithms in the same category, the SSC-Net achieves good experimental results with average precision(AP) of 53.89% and 25.8 frame per second(FPS). The segmentation results meet the practical application requirements.展开更多
To enable efficient and low-cost automated apple harvesting,this study presented a multi-class instance segmentation model,SCAL(Star-CAA-LADH),which utilizes a single RGB sensor for image acquisition.The model achieve...To enable efficient and low-cost automated apple harvesting,this study presented a multi-class instance segmentation model,SCAL(Star-CAA-LADH),which utilizes a single RGB sensor for image acquisition.The model achieves accurate segmentation of fruits,fruit-bearing branches,and main branches using only a single RGB image,providing comprehensive visual inputs for robotic harvesting.A Star-CAA module was proposed by integrating Star operation with a Context-Anchored Attention mechanism(CAA),enhancing directional sensitivity and multi-scale feature perception.The Backbone and Neck networks were equipped with hierarchically structured SCA-T/F modules to improve the fusion of highand low-level features,resulting in more continuous masks and sharper boundaries.In the Head network,a Segment_LADH module was employed to optimize classification,bounding box regression,and mask generation,thereby improving segmentation accuracy for small and adherent targets.To enhance robustness in adverse weather conditions,a Chain-of-Thought Prompted Adaptive Enhancer(CPA)module was integrated,thereby increasing model resilience in degraded environments.Experimental results demonstrate that SCAL achieves 94.9%AP_M and 95.1%mAP_M,outperforming YOLOv11s by 6.6%and 4.6%,respectively.Under multi-weather testing conditions,the CPA-SCAL variant consistently outperforms other comparison models in accuracy.After INT8 quantization,the model size was reduced to 14.5 MB,with an inference speed of 47.2 frames per second(fps)on the NVIDIA Jetson AGX Xavier.Experiments conducted in simulated orchard environments validate the effectiveness and generalization capabilities of the SCAL model,demonstrating its suitability as an efficient and comprehensive visual solution for intelligent harvesting in complex agricultural settings.展开更多
Efficient and accurate segmentation of complex microstructures is a critical challenge in establishing process-structure-property(PSP) linkages of materials. Deep learning(DL)-based instance segmentation algorithms sh...Efficient and accurate segmentation of complex microstructures is a critical challenge in establishing process-structure-property(PSP) linkages of materials. Deep learning(DL)-based instance segmentation algorithms show potential in achieving this goal.However, to ensure prediction reliability, the current algorithms usually have complex structures and demand vast training data.To overcome the model complexity and its dependence on the amount of data, we developed an ingenious DL framework based on a simple method called dual-layer semantics. In the framework, a data standardization module was designed to remove extraneous microstructural noise and accentuate desired structural characteristics, while a post-processing module was employed to further improve segmentation accuracy. The framework was successfully applied in a small dataset of bimodal Ti-6Al-4V microstructures with only 112 samples. Compared with the ground truth, it realizes an 86.81% accuracy IoU for the globular αphase and a 94.70% average size distribution similarity for the colony structures. More importantly, only 36 s was taken to handle a 1024 × 1024 micrograph, which is much faster than the treatment of experienced experts(usually 900 s). The framework proved reliable, interpretable, and scalable, enabling its utilization in complex microstructures to deepen the understanding of PSP linkages.展开更多
Edible mushrooms are rich in nutrients;however,harvesting mainly relies on manual labor.Coarse localization of each mushroom is necessary to enable a robotic arm to accurately pick edible mushrooms.Previous studies us...Edible mushrooms are rich in nutrients;however,harvesting mainly relies on manual labor.Coarse localization of each mushroom is necessary to enable a robotic arm to accurately pick edible mushrooms.Previous studies used detection algorithms that did not consider mushroom pixel-level information.When these algorithms are combined with a depth map,the information is lost.Moreover,in instance segmentation algorithms,convolutional neural network(CNN)-based methods are lightweight,and the extracted features are not correlated.To guarantee real-time location detection and improve the accuracy of mushroom segmentation,this study proposed a new spatial-channel transformer network model based on Mask-CNN(SCT-Mask-RCNN).The fusion of Mask-RCNN with the self-attention mechanism extracts the global correlation outcomes of image features from the channel and spatial dimensions.Subsequently,Mask-RCNN was used to maintain a lightweight structure and extract local features using a spatial pooling pyramidal structure to achieve multiscale local feature fusion and improve detection accuracy.The results showed that the SCT-Mask-RCNN method achieved a segmentation accuracy of 0.750 on segm_Precision_mAP and detection accuracy of 0.638 on Bbox_Precision_mAP.Compared to existing methods,the proposed method improved the accuracy of the evaluation metrics Bbox_Precision_mAP and segm_Precision_mAP by over 2%and 5%,respectively.展开更多
The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to...The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to substantial redundancy,fluctuating sample density and lack of apparent organization.The research area has a wide range of robotics applications,including intelligent vehicles,autonomous mapping and navigation.A number of researchers have introduced various methodologies and algorithms.Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I.methods.However,due to the specific problems of processing point clouds with deep neural networks,deep learning on point clouds is still in its initial stages.This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation.In these approaches’benefits,draw backs,and design mechanisms are studied and addressed.This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets,as well as the most often used pipelines,their advantages and limits,insightful findings and intriguing future research directions.展开更多
In actual traffic scenarios,precise recognition of traffic participants,such as vehicles and pedestrians,is crucial for intelligent transportation.This study proposes an improved algorithm built on Mask-RCNN to enhanc...In actual traffic scenarios,precise recognition of traffic participants,such as vehicles and pedestrians,is crucial for intelligent transportation.This study proposes an improved algorithm built on Mask-RCNN to enhance the ability of autonomous driving systems to recognize traffic participants.The algorithmincorporates long and shortterm memory networks and the fused attention module(GSAM,GCT,and Spatial Attention Module)to enhance the algorithm’s capability to process both global and local information.Additionally,to increase the network’s initial operation stability,the original network activation function was replaced with Gaussian error linear unit.Experiments were conducted using the publicly available Cityscapes dataset.Comparing the test results,it was observed that the revised algorithmoutperformed the original algorithmin terms of AP_(50),AP_(75),and othermetrics by 8.7%and 9.6%for target detection and 12.5%and 13.3%for segmentation.展开更多
基金supported in part by the National Natural Science Foundation of China(Grant No.62062003)Natural Science Foundation of Ningxia(Grant No.2023AAC03293).
文摘The instance segmentation of impacted teeth in the oral panoramic X-ray images is hotly researched.However,due to the complex structure,low contrast,and complex background of teeth in panoramic X-ray images,the task of instance segmentation is technically tricky.In this study,the contrast between impacted Teeth and periodontal tissues such as gingiva,periodontalmembrane,and alveolar bone is low,resulting in fuzzy boundaries of impacted teeth.Amodel based on Teeth YOLACT is proposed to provide amore efficient and accurate solution for the segmentation of impacted teeth in oral panoramic X-ray films.Firstly,a Multi-scale Res-Transformer Module(MRTM)is designed.In the module,depthwise separable convolutions with different receptive fields are used to enhance the sensitivity of the model to lesion size.Additionally,the Vision Transformer is integrated to improve the model’s ability to perceive global features.Secondly,the Context Interaction-awareness Module(CIaM)is designed to fuse deep and shallow features.The deep semantic features guide the shallow spatial features.Then,the shallow spatial features are embedded into the deep semantic features,and the cross-weighted attention mechanism is used to aggregate the deep and shallow features efficiently,and richer context information is obtained.Thirdly,the Edge-preserving perceptionModule(E2PM)is designed to enhance the teeth edge features.The first-order differential operator is used to get the tooth edge weight,and the perception ability of tooth edge features is improved.The shallow spatial feature is fused by linear mapping,weight concatenation,and matrix multiplication operations to preserve the tooth edge information.Finally,comparison experiments and ablation experiments are conducted on the oral panoramic X-ray image datasets.The results show that the APdet,APseg,ARdet,ARseg,mAPdet,and mAPseg indicators of the proposed model are 89.9%,91.9%,77.4%,77.6%,72.8%,and 73.5%,respectively.This study further verifies the application potential of the method combining multi-scale feature extraction,multi-scale feature fusion,and edge perception enhancement in medical image segmentation,which provides a valuable reference for future related research.
基金supported by Science and Technology Research Youth Project of Chongqing Municipal Education Commission(No.KJQN202301104)Cooperative Project between universities in Chongqing and Affiliated Institutes of Chinese Academy of Sciences(No.HZ2021011)+1 种基金Chongqing Municipal Science and Technology Commission Technology Innovation and Application Development Special Project(No.2022TIAD-KPX0040)Action Plan for Quality Development of Chongqing University of Technology Graduate Education(Grant No.gzlcx20242014).
文摘Instance segmentation is crucial in various domains,such as autonomous driving and robotics.However,there is scope for improvement in the detection speed of instance-segmentation algorithms for edge devices.Therefore,it is essential to enhance detection speed while maintaining high accuracy.In this study,we propose you only look once-layer fusion(YOLO-LF),a lightweight instance segmentation method specifically designed to optimize the speed of instance segmentation for autonomous driving applications.Based on the You Only Look Once version 8 nano(YOLOv8n)framework,we introduce a lightweight convolutional module and design a lightweight layer aggrega-tion module called Reparameterization convolution and Partial convolution Efficient Layer Aggregation Networks(RPELAN).This module effectively reduces the impact of redundant information generated by traditional convolutional stacking on the network size and detection speed while enhancing the capability to process feature information.We experimentally verified that our generalized one-stage detection network lightweight method based on Grouped Spatial Convolution(GSconv)enhances the detection speed while maintaining accuracy across various state-of-the-art(SOTA)networks.Our experiments conducted on the publicly available Cityscapes dataset demonstrated that YOLO-LF maintained the same accuracy as yolov8n(mAP@0.537.9%),the model volume decreased by 14.3%from 3.259 to=2.804 M,and the Frames Per Second(FPS)increased by 14.48%from 57.47 to 65.79 compared with YOLOv8n,thereby demonstrating its potential for real-time instance segmentation on edge devices.
基金supported in part by the National Natural Science Foundation of China(No.31470714 and 61701105).
文摘Tree trunk instance segmentation is crucial for under-canopy unmanned aerial vehicles(UAVs)to autonomously extract standing tree stem attributes.Using cameras as sensors makes these UAVs compact and lightweight,facilitating safe and flexible navigation in dense forests.However,their limited onboard computational power makes real-time,image-based tree trunk segmentation challenging,emphasizing the urgent need for lightweight and efficient segmentation models.In this study,we present RT-Trunk,a model specifically designed for real-time tree trunk instance segmentation in complex forest environments.To ensure real-time performance,we selected SparseInst as the base framework.We incorporated ConvNeXt-T as the backbone to enhance feature extraction for tree trunks,thereby improving segmentation accuracy.We further integrate the lightweight convolutional block attention module(CBAM),enabling the model to focus on tree trunk features while suppressing irrelevant information,which leads to additional gains in segmentation accuracy.To enable RT-Trunk to operate effectively under diverse complex forest environments,we constructed a comprehensive dataset for training and testing by combining self-collected data with multiple public datasets covering different locations,seasons,weather conditions,tree species,and levels of forest clutter.Com-pared with the other tree trunk segmentation methods,the RT-Trunk method achieved an average precision of 91.4%and the fastest inference speed of 32.9 frames per second.Overall,the proposed RT-Trunk provides superior trunk segmentation performance that balances speed and accu-racy,making it a promising solution for supporting under-canopy UAVs in the autonomous extraction of standing tree stem attributes.The code for this work is available at https://github.com/NEFU CVRG/RT Trunk.
基金funded by Anhui Provincial Natural Science Foundation(No.2208085ME128)the Anhui University-Level Special Project of Anhui University of Science and Technology(No.XCZX2021-01)+1 种基金the Research and the Development Fund of the Institute of Environmental Friendly Materials and Occupational Health,Anhui University of Science and Technology(No.ALW2022YF06)Anhui Province New Era Education Quality Project(Graduate Education)(No.2022xscx073).
文摘The real-time detection and instance segmentation of strawberries constitute fundamental components in the development of strawberry harvesting robots.Real-time identification of strawberries in an unstructured envi-ronment is a challenging task.Current instance segmentation algorithms for strawberries suffer from issues such as poor real-time performance and low accuracy.To this end,the present study proposes an Efficient YOLACT(E-YOLACT)algorithm for strawberry detection and segmentation based on the YOLACT framework.The key enhancements of the E-YOLACT encompass the development of a lightweight attention mechanism,pyramid squeeze shuffle attention(PSSA),for efficient feature extraction.Additionally,an attention-guided context-feature pyramid network(AC-FPN)is employed instead of FPN to optimize the architecture’s performance.Furthermore,a feature-enhanced model(FEM)is introduced to enhance the prediction head’s capabilities,while efficient fast non-maximum suppression(EF-NMS)is devised to improve non-maximum suppression.The experimental results demonstrate that the E-YOLACT achieves a Box-mAP and Mask-mAP of 77.9 and 76.6,respectively,on the custom dataset.Moreover,it exhibits an impressive category accuracy of 93.5%.Notably,the E-YOLACT also demonstrates a remarkable real-time detection capability with a speed of 34.8 FPS.The method proposed in this article presents an efficient approach for the vision system of a strawberry-picking robot.
基金This research was supported by the National Natural Science Foundation of China No.62276086the National Key R&D Program of China No.2022YFD2000100Zhejiang Provincial Natural Science Foundation of China under Grant No.LTGN23D010002.
文摘Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.
基金The results and knowledge included herein have been obtained owing to support from the following institutional grant.Internal grant agency of the Faculty of Economics and Management,Czech University of Life Sciences Prague,Grant No.2023A0004-“Text Segmentation Methods of Historical Alphabets in OCR Development”.https://iga.pef.czu.cz/.Funds were granted to T.Novák,A.Hamplová,O.Svojše,and A.Veselýfrom the author team.
文摘This study presents a single-class and multi-class instance segmentation approach applied to ancient Palmyrene inscriptions,employing two state-of-the-art deep learning algorithms,namely YOLOv8 and Roboflow 3.0.The goal is to contribute to the preservation and understanding of historical texts,showcasing the potential of modern deep learning methods in archaeological research.Our research culminates in several key findings and scientific contributions.We comprehensively compare the performance of YOLOv8 and Roboflow 3.0 in the context of Palmyrene character segmentation—this comparative analysis mainly focuses on the strengths and weaknesses of each algorithm in this context.We also created and annotated an extensive dataset of Palmyrene inscriptions,a crucial resource for further research in the field.The dataset serves for training and evaluating the segmentation models.We employ comparative evaluation metrics to quantitatively assess the segmentation results,ensuring the reliability and reproducibility of our findings and we present custom visualization tools for predicted segmentation masks.Our study advances the state of the art in semi-automatic reading of Palmyrene inscriptions and establishes a benchmark for future research.The availability of the Palmyrene dataset and the insights into algorithm performance contribute to the broader understanding of historical text analysis.
基金funded by National Natural Science Foundation of China No.62062003Ningxia Natural Science Foundation Project No.2023AAC03293.
文摘The precise detection and segmentation of tumor lesions are very important for lung cancer computer-aided diagnosis.However,in PET/CT(Positron Emission Tomography/Computed Tomography)lung images,the lesion shapes are complex,the edges are blurred,and the sample numbers are unbalanced.To solve these problems,this paper proposes a Multi-branch Cross-scale Interactive Feature fusion Transformer model(MCIF-Transformer Mask RCNN)for PET/CT lung tumor instance segmentation,The main innovative works of this paper are as follows:Firstly,the ResNet-Transformer backbone network is used to extract global feature and local feature in lung images.The pixel dependence relationship is established in local and non-local fields to improve the model perception ability.Secondly,the Cross-scale Interactive Feature Enhancement auxiliary network is designed to provide the shallow features to the deep features,and the cross-scale interactive feature enhancement module(CIFEM)is used to enhance the attention ability of the fine-grained features.Thirdly,the Cross-scale Interactive Feature fusion FPN network(CIF-FPN)is constructed to realize bidirectional interactive fusion between deep features and shallow features,and the low-level features are enhanced in deep semantic features.Finally,4 ablation experiments,3 comparison experiments of detection,3 comparison experiments of segmentation and 6 comparison experiments with two-stage and single-stage instance segmentation networks are done on PET/CT lung medical image datasets.The results showed that APdet,APseg,ARdet and ARseg indexes are improved by 5.5%,5.15%,3.11%and 6.79%compared with Mask RCNN(resnet50).Based on the above research,the precise detection and segmentation of the lesion region are realized in this paper.This method has positive significance for the detection of lung tumors.
基金the National Natural Science Foundation of China(No.62063006)the Natural Science Foundation of Guangxi Province(No.2023GXNS-FAA026025)+3 种基金the Innovation Fund of Chinese Universities Industry-University-Research(ID:2021RYC06005)the Research Project for Young andMiddle-Aged Teachers in Guangxi Universi-ties(ID:2020KY15013)the Special Research Project of Hechi University(ID:2021GCC028)financially supported by the Project of Outstanding Thousand Young Teachers’Training in Higher Education Institutions of Guangxi,Guangxi Colleges and Universities Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region.
文摘Dynamic Simultaneous Localization and Mapping(SLAM)in visual scenes is currently a major research area in fields such as robot navigation and autonomous driving.However,in the face of complex real-world envi-ronments,current dynamic SLAM systems struggle to achieve precise localization and map construction.With the advancement of deep learning,there has been increasing interest in the development of deep learning-based dynamic SLAM visual odometry in recent years,and more researchers are turning to deep learning techniques to address the challenges of dynamic SLAM.Compared to dynamic SLAM systems based on deep learning methods such as object detection and semantic segmentation,dynamic SLAM systems based on instance segmentation can not only detect dynamic objects in the scene but also distinguish different instances of the same type of object,thereby reducing the impact of dynamic objects on the SLAM system’s positioning.This article not only introduces traditional dynamic SLAM systems based on mathematical models but also provides a comprehensive analysis of existing instance segmentation algorithms and dynamic SLAM systems based on instance segmentation,comparing and summarizing their advantages and disadvantages.Through comparisons on datasets,it is found that instance segmentation-based methods have significant advantages in accuracy and robustness in dynamic environments.However,the real-time performance of instance segmentation algorithms hinders the widespread application of dynamic SLAM systems.In recent years,the rapid development of single-stage instance segmentationmethods has brought hope for the widespread application of dynamic SLAM systems based on instance segmentation.Finally,possible future research directions and improvementmeasures are discussed for reference by relevant professionals.
基金supported by National Key Research and Development Program(No.2022YFE0112400)National Natural Science Foundation of China(No.21706096)Natural Science Foundation of Jiangsu Province(No.BK20160162).
文摘Instance segmentation plays an important role in image processing.The Deep Snake algorithm based on contour iteration deforms an initial bounding box to an instance contour end-to-end,which can improve the performance of instance segmentation,but has defects such as slow segmentation speed and sub-optimal initial contour.To solve these problems,a real-time instance segmentation algorithm based on contour learning was proposed.Firstly,ShuffleNet V2 was used as backbone network,and the receptive field of the model was expanded by using a 5×5 convolution kernel.Secondly,a lightweight up-sampling module,multi-stage aggregation(MSA),performs residual fusion of multi-layer features,which not only improves segmentation speed,but also extracts effective features more comprehensively.Thirdly,a contour initialization method for network learning was designed,and a global contour feature aggregation mechanism was used to return a coarse contour,which solves the problem of excessive error between manually initialized contour and real contour.Finally,the Snake deformation module was used to iteratively optimize the coarse contour to obtain the final instance contour.The experimental results showed that the proposed method improved the instance segmentation accuracy on semantic boundaries dataset(SBD),Cityscapes and Kins datasets,and the average precision reached 55.8 on the SBD;Compared with Deep Snake,the model parameters were reduced by 87.2%,calculation amount was reduced by 78.3%,and segmentation speed reached 39.8 frame·s^(−1) when instance segmentation was performed on an image with a size of 512×512 pixels on a 2080Ti GPU.The proposed method can reduce resource consumption,realize instance segmentation tasks quickly and accurately,and therefore is more suitable for embedded platforms with limited resources.
基金National Natural Science Foundation of China(Nos.U2033201 and U1633105)。
文摘Skin defect inspection is one of the most significant tasks in the conventional process of aircraft inspection.This paper proposes a vision-based method of pixel-level defect detection,which is based on the Mask Scoring R-CNN.First,an attention mechanism and a feature fusion module are introduced,to improve feature representation.Second,a new classifier head—consisting of four convolutional layers and a fully connected layer—is proposed,to reduce the influence of information around the area of the defect.Third,to evaluate the proposed method,a dataset of aircraft skin defects was constructed,containing 276 images with a resolution of 960×720 pixels.Experimental results show that the proposed classifier head improves the detection and segmentation accuracy,for aircraft skin defect inspection,more effectively than the attention mechanism and feature fusion module.Compared with the Mask R-CNN and Mask Scoring R-CNN,the proposed method increased the segmentation precision by approximately 21%and 19.59%,respectively.These results demonstrate that the proposed method performs favorably against the other two methods of pixellevel aircraft skin defect detection.
基金supported by the National Natural Science Foundation of China (31400074, 31471516, 31271747, and 30971809)the Natural Science Foundation of Heilongjiang Province of China(ZD201213)the Heilongjiang Postdoctoral Science Foundation(LBH-Q18025)。
文摘Mature soybean phenotyping is an important process in soybean breeding;however, the manual process is time-consuming and labor-intensive. Therefore, a novel approach that is rapid, accurate and highly precise is required to obtain the phenotypic data of soybean stems, pods and seeds. In this research, we propose a mature soybean phenotype measurement algorithm called Soybean Phenotype Measure-instance Segmentation(SPM-IS). SPM-IS is based on a feature pyramid network, Principal Component Analysis(PCA) and instance segmentation. We also propose a new method that uses PCA to locate and measure the length and width of a target object via image instance segmentation. After 60,000 iterations, the maximum mean Average Precision(m AP) of the mask and box was able to reach 95.7%. The correlation coefficients R^(2) of the manual measurement and SPM-IS measurement of the pod length, pod width, stem length, complete main stem length, seed length and seed width were 0.9755, 0.9872, 0.9692, 0.9803,0.9656, and 0.9716, respectively. The correlation coefficients R^(2) of the manual counting and SPM-IS counting of pods, stems and seeds were 0.9733, 0.9872, and 0.9851, respectively. The above results show that SPM-IS is a robust measurement and counting algorithm that can reduce labor intensity, improve efficiency and speed up the soybean breeding process.
基金supported in part by the National Natural Science Foundation of China(62176139,62106128,62176141)the Major Basic Research Project of Shandong Natural Science Foundation(ZR2021ZD15)+4 种基金the Natural Science Foundation of Shandong Province(ZR2021QF001)the Young Elite Scientists Sponsorship Program by CAST(2021QNRC001)the Open Project of Key Laboratory of Artificial Intelligence,Ministry of Educationthe Shandong Provincial Natural Science Foundation for Distinguished Young Scholars(ZR2021JQ26)the Taishan Scholar Project of Shandong Province(tsqn202103088)。
文摘We introduce a novel method using a new generative model that automatically learns effective representations of the target and background appearance to detect,segment and track each instance in a video sequence.Differently from current discriminative tracking-by-detection solutions,our proposed hierarchical structural embedding learning can predict more highquality masks with accurate boundary details over spatio-temporal space via the normalizing flows.We formulate the instance inference procedure as a hierarchical spatio-temporal embedded learning across time and space.Given the video clip,our method first coarsely locates pixels belonging to a particular instance with Gaussian distribution and then builds a novel mixing distribution to promote the instance boundary by fusing hierarchical appearance embedding information in a coarse-to-fine manner.For the mixing distribution,we utilize a factorization condition normalized flow fashion to estimate the distribution parameters to improve the segmentation performance.Comprehensive qualitative,quantitative,and ablation experiments are performed on three representative video instance segmentation benchmarks(i.e.,YouTube-VIS19,YouTube-VIS21,and OVIS)and the effectiveness of the proposed method is demonstrated.More impressively,the superior performance of our model on an unsupervised video object segmentation dataset(i.e.,DAVIS19)proves its generalizability.Our algorithm implementations are publicly available at https://github.com/zyqin19/HEVis.
基金supported by the Natural Science Foundation of Guizhou Province(Grant Number:20161054)Joint Natural Science Foundation of Guizhou Province(Grant Number:LH20177226)+1 种基金2017 Special Project of New Academic Talent Training and Innovation Exploration of Guizhou University(Grant Number:20175788)The National Natural Science Foundation of China under Grant No.12205062.
文摘Autonomous driving technology has made a lot of outstanding achievements with deep learning,and the vehicle detection and classification algorithm has become one of the critical technologies of autonomous driving systems.The vehicle instance segmentation can perform instance-level semantic parsing of vehicle information,which is more accurate and reliable than object detection.However,the existing instance segmentation algorithms still have the problems of poor mask prediction accuracy and low detection speed.Therefore,this paper proposes an advanced real-time instance segmentation model named FIR-YOLACT,which fuses the ICIoU(Improved Complete Intersection over Union)and Res2Net for the YOLACT algorithm.Specifically,the ICIoU function can effectively solve the degradation problem of the original CIoU loss function,and improve the training convergence speed and detection accuracy.The Res2Net module fused with the ECA(Efficient Channel Attention)Net is added to the model’s backbone network,which improves the multi-scale detection capability and mask prediction accuracy.Furthermore,the Cluster NMS(Non-Maximum Suppression)algorithm is introduced in the model’s bounding box regression to enhance the performance of detecting similarly occluded objects.The experimental results demonstrate the superiority of FIR-YOLACT to the based methods and the effectiveness of all components.The processing speed reaches 28 FPS,which meets the demands of real-time vehicle instance segmentation.
文摘3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments.It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis.The computer vision,graphics,and machine learning fields have all given it a lot of attention.Traditionally,3D segmentation was done with hand-crafted features and designed approaches that didn’t achieve acceptable performance and couldn’t be generalized to large-scale data.Deep learning approaches have lately become the preferred method for 3D segmentation challenges by their great success in 2D computer vision.However,the task of instance segmentation is currently less explored.In this paper,we propose a novel approach for efficient 3D instance segmentation using red green blue and depth(RGB-D)data based on deep learning.The 2D region based convolutional neural networks(Mask R-CNN)deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.In order to generate 3D point cloud coordinates(x,y,z),segmented 2D pixels(u,v)of recognized object regions in the RGB image are merged into(u,v)points of the depth image.Moreover,we conducted an experiment and analysis to compare our proposed method from various points of view and distances.The experimentation shows the proposed 3D object recognition and instance segmentation are sufficiently beneficial to support object handling in robotic and intelligent systems.
基金supported by the National Key Research and Development Program of China (2021YFF0901700)。
文摘Different objects in Chinese paintings contain rich cultural connotations. Segmenting and extracting different objects in Chinese paintings through technical methods is an effective way to enhance cultural added value and activate cultural resources.Although the existing deep learning methods can extract multi-level features for instance segmentation, the location relationship features of instances are not fully utilized, resulting in poor segmentation results for the traditional Chinese painting(TCP) instance segmentation. In this paper, a novel TCP image instance segmentation algorithm based on the integration of spatial structure characteristics(SSC) was proposed, and is called SSC-Net. Firstly, considering the characteristics of TCP images, such as the gradual color blending and discontinuous contour lines, an instance information entropy composed of color entropy, formed by regional variance, and contour entropy, formed by contour point regression is proposed. Then, aiming at the problem that the existing network structure is difficult to fully consider the location relationship features of instances in TCP images, based on the residual neural network(ResNet) structure, a Chinese painting instance segmentation network framework composed of mask branch and position branch that can integrate spatial structure features is proposed. Finally, the color entropy and contour entropy are input into the mask branch and position branch of the SSC-Net structure respectively, so as to realize the instance segmentation of TCP. The quantitative and qualitative experiments on the challenging TCP database show that, compared with the state-of-the-art algorithms in the same category, the SSC-Net achieves good experimental results with average precision(AP) of 53.89% and 25.8 frame per second(FPS). The segmentation results meet the practical application requirements.
基金supported by the Qinchuangyuan Project of Shaanxi Province(Grant No.2023KXJ-016).
文摘To enable efficient and low-cost automated apple harvesting,this study presented a multi-class instance segmentation model,SCAL(Star-CAA-LADH),which utilizes a single RGB sensor for image acquisition.The model achieves accurate segmentation of fruits,fruit-bearing branches,and main branches using only a single RGB image,providing comprehensive visual inputs for robotic harvesting.A Star-CAA module was proposed by integrating Star operation with a Context-Anchored Attention mechanism(CAA),enhancing directional sensitivity and multi-scale feature perception.The Backbone and Neck networks were equipped with hierarchically structured SCA-T/F modules to improve the fusion of highand low-level features,resulting in more continuous masks and sharper boundaries.In the Head network,a Segment_LADH module was employed to optimize classification,bounding box regression,and mask generation,thereby improving segmentation accuracy for small and adherent targets.To enhance robustness in adverse weather conditions,a Chain-of-Thought Prompted Adaptive Enhancer(CPA)module was integrated,thereby increasing model resilience in degraded environments.Experimental results demonstrate that SCAL achieves 94.9%AP_M and 95.1%mAP_M,outperforming YOLOv11s by 6.6%and 4.6%,respectively.Under multi-weather testing conditions,the CPA-SCAL variant consistently outperforms other comparison models in accuracy.After INT8 quantization,the model size was reduced to 14.5 MB,with an inference speed of 47.2 frames per second(fps)on the NVIDIA Jetson AGX Xavier.Experiments conducted in simulated orchard environments validate the effectiveness and generalization capabilities of the SCAL model,demonstrating its suitability as an efficient and comprehensive visual solution for intelligent harvesting in complex agricultural settings.
基金supported by the National Key R&D Program of China(Grant No.2023YFB4606502)the National Natural Science Foundation of China(Grant Nos.51871183 and 51874245)+1 种基金the Research Fund of the State Key Laboratory of Solidification Processing(NPU), China(Grant No.2020-TS-06)Sponsored by the Practice and Innovation Funds for Graduate Students of Northwestern Polytechnical University。
文摘Efficient and accurate segmentation of complex microstructures is a critical challenge in establishing process-structure-property(PSP) linkages of materials. Deep learning(DL)-based instance segmentation algorithms show potential in achieving this goal.However, to ensure prediction reliability, the current algorithms usually have complex structures and demand vast training data.To overcome the model complexity and its dependence on the amount of data, we developed an ingenious DL framework based on a simple method called dual-layer semantics. In the framework, a data standardization module was designed to remove extraneous microstructural noise and accentuate desired structural characteristics, while a post-processing module was employed to further improve segmentation accuracy. The framework was successfully applied in a small dataset of bimodal Ti-6Al-4V microstructures with only 112 samples. Compared with the ground truth, it realizes an 86.81% accuracy IoU for the globular αphase and a 94.70% average size distribution similarity for the colony structures. More importantly, only 36 s was taken to handle a 1024 × 1024 micrograph, which is much faster than the treatment of experienced experts(usually 900 s). The framework proved reliable, interpretable, and scalable, enabling its utilization in complex microstructures to deepen the understanding of PSP linkages.
基金supported by China Agriculture Research System of MOF and MARA(CARS-20)Zhejiang Provincial Key Laboratory of Agricultural Intelligent Equipment and Robotics Open Fund(2023ZJZD2301)+1 种基金Chinese Academy of Agricultural Science and Technology Innovation Project“Fruit And Vegetable Production And Processing Technical Equipment Team”(2024)Beijing Nova Program(20220484023).
文摘Edible mushrooms are rich in nutrients;however,harvesting mainly relies on manual labor.Coarse localization of each mushroom is necessary to enable a robotic arm to accurately pick edible mushrooms.Previous studies used detection algorithms that did not consider mushroom pixel-level information.When these algorithms are combined with a depth map,the information is lost.Moreover,in instance segmentation algorithms,convolutional neural network(CNN)-based methods are lightweight,and the extracted features are not correlated.To guarantee real-time location detection and improve the accuracy of mushroom segmentation,this study proposed a new spatial-channel transformer network model based on Mask-CNN(SCT-Mask-RCNN).The fusion of Mask-RCNN with the self-attention mechanism extracts the global correlation outcomes of image features from the channel and spatial dimensions.Subsequently,Mask-RCNN was used to maintain a lightweight structure and extract local features using a spatial pooling pyramidal structure to achieve multiscale local feature fusion and improve detection accuracy.The results showed that the SCT-Mask-RCNN method achieved a segmentation accuracy of 0.750 on segm_Precision_mAP and detection accuracy of 0.638 on Bbox_Precision_mAP.Compared to existing methods,the proposed method improved the accuracy of the evaluation metrics Bbox_Precision_mAP and segm_Precision_mAP by over 2%and 5%,respectively.
基金This research was supported by the BB21 plus funded by Busan Metropolitan City and Busan Institute for Talent and Lifelong Education(BIT)and a grant from Tongmyong University Innovated University Research Park(I-URP)funded by Busan Metropolitan City,Republic of Korea.
文摘The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to substantial redundancy,fluctuating sample density and lack of apparent organization.The research area has a wide range of robotics applications,including intelligent vehicles,autonomous mapping and navigation.A number of researchers have introduced various methodologies and algorithms.Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I.methods.However,due to the specific problems of processing point clouds with deep neural networks,deep learning on point clouds is still in its initial stages.This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation.In these approaches’benefits,draw backs,and design mechanisms are studied and addressed.This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets,as well as the most often used pipelines,their advantages and limits,insightful findings and intriguing future research directions.
基金the National Natural Science Foundation of China(52175236)Qingdao People’s Livelihood Science and Technology Plan(19-6-1-88-nsh).
文摘In actual traffic scenarios,precise recognition of traffic participants,such as vehicles and pedestrians,is crucial for intelligent transportation.This study proposes an improved algorithm built on Mask-RCNN to enhance the ability of autonomous driving systems to recognize traffic participants.The algorithmincorporates long and shortterm memory networks and the fused attention module(GSAM,GCT,and Spatial Attention Module)to enhance the algorithm’s capability to process both global and local information.Additionally,to increase the network’s initial operation stability,the original network activation function was replaced with Gaussian error linear unit.Experiments were conducted using the publicly available Cityscapes dataset.Comparing the test results,it was observed that the revised algorithmoutperformed the original algorithmin terms of AP_(50),AP_(75),and othermetrics by 8.7%and 9.6%for target detection and 12.5%and 13.3%for segmentation.