Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinct...Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.展开更多
This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obsta...This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obstacle avoidance.The system employs the YOLOv7 deep-learning framework for semantic detection and SLAM for localization and mapping,fusing geometric and visual data to build a high-fidelity 2D semantic map.This map enables the robot to identify and project object information for improved situational awareness.Experimental results show that object recognition reached 95.4%mAP@0.5.Semantic completeness increased from 68.7%(single view)to 94.1%(multi-view)with an average position error of 3.1 cm.During navigation,the robot achieved 98.0%reliability,avoided moving obstacles in 90.0%of encounters,and replanned paths in 0.42 s on average.The integration of LiDAR-based SLAMwith deep-learning–driven semantic perception establishes a robust foundation for intelligent,adaptive,and safe robotic navigation in dynamic environments.展开更多
Regular detection of pavement cracks is essential for infrastructure maintenance.However,existing methods often ignore the challenges such as the continuous evolution of crack features between video frames and the dif...Regular detection of pavement cracks is essential for infrastructure maintenance.However,existing methods often ignore the challenges such as the continuous evolution of crack features between video frames and the difficulty of defect quantification.To this end,this paper proposes an integrated framework for pavement crack detection,segmentation,tracking and counting based on Transformer.Firstly,we design theVitSeg-Det network,which is an integrated detection and segmentation network that can accurately locate and segment tiny cracks in complex scenes.Second,the TransTra-Count system is developed to automatically count the number of defects by combining defect tracking with width estimation.Finally,we conduct experimental verification on three datasets.The results show that the proposed method is superior to the existing deep learning methods in detection accuracy.In addition,the actual scene video test shows that the framework can accurately label the defect location and output the number of defects in real time.展开更多
Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and stru...Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.展开更多
In the age of big data,ensuring data privacy while enabling efficient encrypted data retrieval has become a critical challenge.Traditional searchable encryption schemes face difficulties in handling complex semantic q...In the age of big data,ensuring data privacy while enabling efficient encrypted data retrieval has become a critical challenge.Traditional searchable encryption schemes face difficulties in handling complex semantic queries.Additionally,they typically rely on honest but curious cloud servers,which introduces the risk of repudiation.Furthermore,the combined operations of search and verification increase system load,thereby reducing performance.Traditional verification mechanisms,which rely on complex hash constructions,suffer from low verification efficiency.To address these challenges,this paper proposes a blockchain-based contextual semantic-aware ciphertext retrieval scheme with efficient verification.Building on existing single and multi-keyword search methods,the scheme uses vector models to semantically train the dataset,enabling it to retain semantic information and achieve context-aware encrypted retrieval,significantly improving search accuracy.Additionally,a blockchain-based updatable master-slave chain storage model is designed,where the master chain stores encrypted keyword indexes and the slave chain stores verification information generated by zero-knowledge proofs,thus balancing system load while improving search and verification efficiency.Finally,an improved non-interactive zero-knowledge proof mechanism is introduced,reducing the computational complexity of verification and ensuring efficient validation of search results.Experimental results demonstrate that the proposed scheme offers stronger security,balanced overhead,and higher search verification efficiency.展开更多
Chinese abbreviations improve communicative efficiency by extracting key components from longer expressions.They are widely used in both daily communication and professional domains.However,existing abbreviation gener...Chinese abbreviations improve communicative efficiency by extracting key components from longer expressions.They are widely used in both daily communication and professional domains.However,existing abbreviation generation methods still face two major challenges.First,sequence-labeling-based approaches often neglect contextual meaning by making binary decisions at the character level,leading to abbreviations that fail to capture semantic completeness.Second,generation-basedmethods rely heavily on a single decoding process,which frequently produces correct abbreviations but ranks them lower due to inadequate semantic evaluation.To address these limitations,we propose a novel two-stage frameworkwithGeneration–Iterative Optimization forAbbreviation(GIOA).In the first stage,we design aChain-of-Thought prompting strategy and incorporate definitional and situational contexts to generate multiple abbreviation candidates.In the second stage,we introduce a Semantic Preservation Dynamic Adjustment mechanism that alternates between character-level importance estimation and semantic restoration to optimize candidate ranking.Experiments on two public benchmark datasets show that our method outperforms existing state-of-the-art approaches,achieving Hit@1 improvements of 15.15%and 13.01%,respectively,while maintaining consistent results in Hit@3.展开更多
High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes an...High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.展开更多
Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep...Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep learning to medium-range regional weather forecasting with limited data remains a significant challenge.In this work,three key solutions are proposed:(1)motivated by the need to improve model performance in data-scarce regional forecasting scenarios,the authors innovatively apply semantic segmentation models,to better capture spatiotemporal features and improve prediction accuracy;(2)recognizing the challenge of overfitting and the inability of traditional noise-based data augmentation methods to effectively enhance model robustness,a novel learnable Gaussian noise mechanism is introduced that allows the model to adaptively optimize perturbations for different locations,ensuring more effective learning;and(3)to address the issue of error accumulation in autoregressive prediction,as well as the challenge of learning difficulty and the lack of intermediate data utilization in one-shot prediction,the authors propose a cascade prediction approach that effectively resolves these problems while significantly improving model forecasting performance.The method achieves a competitive result in The East China Regional AI Medium Range Weather Forecasting Competition.Ablation experiments further validate the effectiveness of each component,highlighting their contributions to enhancing prediction performance.展开更多
[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis...[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis of the severity of tea spots was proposed in this research,designated as MDC-U-Net3+,to enhance segmentation accuracy on the base framework of U-Net3+.[Methods]Multi-scale feature fusion module(MSFFM)was incorporated into the backbone network of U-Net3+to obtain feature information across multiple receptive fields of diseased spots,thereby reducing the loss of features within the encoder.Dual multi-scale attention(DMSA)was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue.This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale.Furthermore,the segmented mask image was subjected to conditional random fields(CRF)to enhance the optimization of the segmentation results[Results and Discussions]The improved model MDC-U-Net3+achieved a mean pixel accuracy(mPA)of 94.92%,accompanied by a mean Intersection over Union(mIoU)ratio of 90.9%.When compared to the mPA and mIoU of U-Net3+,MDC-U-Net3+model showed improvements of 1.85 and 2.12 percentage points,respectively.These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models.[Conclusions]The methodology presented herein could provide data support for automated disease detection and precise medication,consequently reducing the losses associated with tea diseases.展开更多
In image analysis,high-precision semantic segmentation predominantly relies on supervised learning.Despite significant advancements driven by deep learning techniques,challenges such as class imbalance and dynamic per...In image analysis,high-precision semantic segmentation predominantly relies on supervised learning.Despite significant advancements driven by deep learning techniques,challenges such as class imbalance and dynamic performance evaluation persist.Traditional weighting methods,often based on pre-statistical class counting,tend to overemphasize certain classes while neglecting others,particularly rare sample categories.Approaches like focal loss and other rare-sample segmentation techniques introduce multiple hyperparameters that require manual tuning,leading to increased experimental costs due to their instability.This paper proposes a novel CAWASeg framework to address these limitations.Our approach leverages Grad-CAM technology to generate class activation maps,identifying key feature regions that the model focuses on during decision-making.We introduce a Comprehensive Segmentation Performance Score(CSPS)to dynamically evaluate model performance by converting these activation maps into pseudo mask and comparing them with Ground Truth.Additionally,we design two adaptive weights for each class:a Basic Weight(BW)and a Ratio Weight(RW),which the model adjusts during training based on real-time feedback.Extensive experiments on the COCO-Stuff,CityScapes,and ADE20k datasets demonstrate that our CAWASeg framework significantly improves segmentation performance for rare sample categories while enhancing overall segmentation accuracy.The proposed method offers a robust and efficient solution for addressing class imbalance in semantic segmentation tasks.展开更多
Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural n...Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural network(CNN)based methods,in which class activation mapping(CAM)is proposed to obtain the pseudo labels,and only concentrates on the most discriminative parts.Recently,transformer-based methods utilize attention map from the multi-headed self-attention(MHSA)module to predict pseudo labels,which usually contain obvious background noise and incoherent object area.To solve the above problems,we use the Conformer as our backbone,which is a parallel network based on convolutional neural network(CNN)and Transformer.The two branches generate pseudo labels and refine them independently,and can effectively combine the advantages of CNN and Transformer.However,the parallel structure is not close enough in the information communication.Thus,parallel structure can result in poor details about pseudo labels,and the background noise still exists.To alleviate this problem,we propose enhancing convolution CAM(ECCAM)model,which have three improved modules based on enhancing convolution,including deeper stem(DStem),convolutional feed-forward network(CFFN)and feature coupling unit with convolution(FCUConv).The ECCAM could make Conformer have tighter interaction between CNN and Transformer branches.After experimental verification,the improved modules we propose can help the network perceive more local information from images,making the final segmentation results more refined.Compared with similar architecture,our modules greatly improve the semantic segmentation performance and achieve70.2%mean intersection over union(mIoU)on the PASCAL VOC 2012 dataset.展开更多
As a key node of modern transportation network,the informationization management of road tunnels is crucial to ensure the operation safety and traffic efficiency.However,the existing tunnel vehicle modeling methods ge...As a key node of modern transportation network,the informationization management of road tunnels is crucial to ensure the operation safety and traffic efficiency.However,the existing tunnel vehicle modeling methods generally have problems such as insufficient 3D scene description capability and low dynamic update efficiency,which are difficult to meet the demand of real-time accurate management.For this reason,this paper proposes a vehicle twin modeling method for road tunnels.This approach starts from the actual management needs,and supports multi-level dynamic modeling from vehicle type,size to color by constructing a vehicle model library that can be flexibly invoked;at the same time,semantic constraint rules with geometric layout,behavioral attributes,and spatial relationships are designed to ensure that the virtual model matches with the real model with a high degree of similarity;ultimately,the prototype system is constructed and the case region is selected for the case study,and the dynamic vehicle status in the tunnel is realized by integrating real-time monitoring data with semantic constraints for precise virtual-real mapping.Finally,the prototype system is constructed and case experiments are conducted in selected case areas,which are combined with real-time monitoring data to realize dynamic updating and three-dimensional visualization of vehicle states in tunnels.The experiments show that the proposed method can run smoothly with an average rendering efficiency of 17.70 ms while guaranteeing the modeling accuracy(composite similarity of 0.867),which significantly improves the real-time and intuitive tunnel management.The research results provide reliable technical support for intelligent operation and emergency response of road tunnels,and offer new ideas for digital twin modeling of complex scenes.展开更多
Laser-directed energy deposition(L-DED)is an advanced additive manufacturing technology primarily adopted in metal three-dimensional printing systems.The L-DED process is characterized by various defects,thus necessit...Laser-directed energy deposition(L-DED)is an advanced additive manufacturing technology primarily adopted in metal three-dimensional printing systems.The L-DED process is characterized by various defects,thus necessitating the extensive use of in-situ monitoring to enable real-time adjustments of process parameters by detecting molten-pool features.To address the challenge of accurately extracting the molten-pool morphology from an undetached spatter,an innovative monitoring method based on the U-Net(U-shaped network)is proposed herein.A lightweight architecture accelerates the processing speed,whereas an enhanced loss function incorporating weight maps augments the segmentation precision.The model performance is evaluated by comparing its segmentation accuracy and processing speed with those of the conventional U-Net,using the mean intersection over union(MIoU)as the segmentation metric.The improved model demonstrates superior segmentation accuracy at the interface between the molten pool and spatter,with a peak MIoU of 0.9798 achieved on the test set.Furthermore,this model processes each image in an extremely short time of 17.9 ms.Using this segmentation algorithm,the error in extracting the molten-pool width from single-track experiments is within 0.1 mm.The proposed method for monitoring the molten-pool morphology is suitable for deployment in online monitoring systems,thus providing a foundation for subsequent process-parameter regulation.展开更多
The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology play...The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology plays a crucial role in vehicle localization and navigation. Traditional Simultaneous Localization and Mapping (SLAM) systems are designed for use in static environments, and they can result in poor performance in terms of accuracy and robustness when used in dynamic environments where objects are in constant movement. To address this issue, a new real-time visual SLAM system called MG-SLAM has been developed. Based on ORB-SLAM2, MG-SLAM incorporates a dynamic target detection process that enables the detection of both known and unknown moving objects. In this process, a separate semantic segmentation thread is required to segment dynamic target instances, and the Mask R-CNN algorithm is applied on the Graphics Processing Unit (GPU) to accelerate segmentation. To reduce computational cost, only key frames are segmented to identify known dynamic objects. Additionally, a multi-view geometry method is adopted to detect unknown moving objects. The results demonstrate that MG-SLAM achieves higher precision, with an improvement from 0.2730 m to 0.0135 m in precision. Moreover, the processing time required by MG-SLAM is significantly reduced compared to other dynamic scene SLAM algorithms, which illustrates its efficacy in locating objects in dynamic scenes.展开更多
Karst fractures serve as crucial seepage channels and storage spaces for carbonate natural gas reservoirs,and electrical image logs are vital data for visualizing and characterizing such fractures.However,the conventi...Karst fractures serve as crucial seepage channels and storage spaces for carbonate natural gas reservoirs,and electrical image logs are vital data for visualizing and characterizing such fractures.However,the conventional approach of identifying fractures using electrical image logs predominantly relies on manual processes that are not only time-consuming but also highly subjective.In addition,the heterogeneity and strong dissolution tendency of karst carbonate reservoirs lead to complexity and variety in fracture geometry,which makes it difficult to accurately identify fractures.In this paper,the electrical image logs network(EILnet)da deep-learning-based intelligent semantic segmentation model with a selective attention mechanism and selective feature fusion moduledwas created to enable the intelligent identification and segmentation of different types of fractures through electrical logging images.Data from electrical image logs representing structural and induced fractures were first selected using the sliding window technique before image inpainting and data augmentation were implemented for these images to improve the generalizability of the model.Various image-processing tools,including the bilateral filter,Laplace operator,and Gaussian low-pass filter,were also applied to the electrical logging images to generate a multi-attribute dataset to help the model learn the semantic features of the fractures.The results demonstrated that the EILnet model outperforms mainstream deep-learning semantic segmentation models,such as Fully Convolutional Networks(FCN-8s),U-Net,and SegNet,for both the single-channel dataset and the multi-attribute dataset.The EILnet provided significant advantages for the single-channel dataset,and its mean intersection over union(MIoU)and pixel accuracy(PA)were 81.32%and 89.37%,respectively.In the case of the multi-attribute dataset,the identification capability of all models improved to varying degrees,with the EILnet achieving the highest MIoU and PA of 83.43%and 91.11%,respectively.Further,applying the EILnet model to various blind wells demonstrated its ability to provide reliable fracture identification,thereby indicating its promising potential applications.展开更多
To avoid the laborious annotation process for dense prediction tasks like semantic segmentation,unsupervised domain adaptation(UDA)methods have been proposed to leverage the abundant annotations from a source domain,s...To avoid the laborious annotation process for dense prediction tasks like semantic segmentation,unsupervised domain adaptation(UDA)methods have been proposed to leverage the abundant annotations from a source domain,such as virtual world(e.g.,3D games),and adapt models to the target domain(the real world)by narrowing the domain discrepancies.However,because of the large domain gap,directly aligning two distinct domains without considering the intermediates leads to inefficient alignment and inferior adaptation.To address this issue,we propose a novel learnable evolutionary Category Intermediates(CIs)guided UDA model named Leci,which enables the information transfer between the two domains via two processes,i.e.,Distilling and Blending.Starting from a random initialization,the CIs learn shared category-wise semantics automatically from two domains in the Distilling process.Then,the learned semantics in the CIs are sent back to blend the domain features through a residual attentive fusion(RAF)module,such that the categorywise features of both domains shift towards each other.As the CIs progressively and consistently learn from the varying feature distributions during training,they are evolutionary to guide the model to achieve category-wise feature alignment.Experiments on both GTA5 and SYNTHIA datasets demonstrate Leci's superiority over prior representative methods.展开更多
Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and text...Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.展开更多
Ecological monitoring vehicles are equipped with a range of sensors and monitoring devices designed to gather data on ecological and environmental factors.These vehicles are crucial in various fields,including environ...Ecological monitoring vehicles are equipped with a range of sensors and monitoring devices designed to gather data on ecological and environmental factors.These vehicles are crucial in various fields,including environmental science research,ecological and environmental monitoring projects,disaster response,and emergency management.A key method employed in these vehicles for achieving high-precision positioning is LiDAR(lightlaser detection and ranging)-Visual Simultaneous Localization and Mapping(SLAM).However,maintaining highprecision localization in complex scenarios,such as degraded environments or when dynamic objects are present,remains a significant challenge.To address this issue,we integrate both semantic and texture information from LiDAR and cameras to enhance the robustness and efficiency of data registration.Specifically,semantic information simplifies the modeling of scene elements,reducing the reliance on dense point clouds,which can be less efficient.Meanwhile,visual texture information complements LiDAR-Visual localization by providing additional contextual details.By incorporating semantic and texture details frompaired images and point clouds,we significantly improve the quality of data association,thereby increasing the success rate of localization.This approach not only enhances the operational capabilities of ecological monitoring vehicles in complex environments but also contributes to improving the overall efficiency and effectiveness of ecological monitoring and environmental protection efforts.展开更多
An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
文摘Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.
基金supported by the National Science and Technology Council of under Grant NSTC 114-2221-E-130-007.
文摘This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obstacle avoidance.The system employs the YOLOv7 deep-learning framework for semantic detection and SLAM for localization and mapping,fusing geometric and visual data to build a high-fidelity 2D semantic map.This map enables the robot to identify and project object information for improved situational awareness.Experimental results show that object recognition reached 95.4%mAP@0.5.Semantic completeness increased from 68.7%(single view)to 94.1%(multi-view)with an average position error of 3.1 cm.During navigation,the robot achieved 98.0%reliability,avoided moving obstacles in 90.0%of encounters,and replanned paths in 0.42 s on average.The integration of LiDAR-based SLAMwith deep-learning–driven semantic perception establishes a robust foundation for intelligent,adaptive,and safe robotic navigation in dynamic environments.
基金supported in part by the Natural Science Foundation of Shaanxi Province of China under Grant 2024JC-YBQN-0695.
文摘Regular detection of pavement cracks is essential for infrastructure maintenance.However,existing methods often ignore the challenges such as the continuous evolution of crack features between video frames and the difficulty of defect quantification.To this end,this paper proposes an integrated framework for pavement crack detection,segmentation,tracking and counting based on Transformer.Firstly,we design theVitSeg-Det network,which is an integrated detection and segmentation network that can accurately locate and segment tiny cracks in complex scenes.Second,the TransTra-Count system is developed to automatically count the number of defects by combining defect tracking with width estimation.Finally,we conduct experimental verification on three datasets.The results show that the proposed method is superior to the existing deep learning methods in detection accuracy.In addition,the actual scene video test shows that the framework can accurately label the defect location and output the number of defects in real time.
文摘Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.
基金supported in part by the National Natural Science Foundation of China under Grant 62262073in part by the Yunnan Provincial Ten Thousand People Program for Young Top Talents under Grant YNWR-QNBJ-2019-237in part by the Yunnan Provincial Major Science and Technology Special Program under Grant 202402AD080002.
文摘In the age of big data,ensuring data privacy while enabling efficient encrypted data retrieval has become a critical challenge.Traditional searchable encryption schemes face difficulties in handling complex semantic queries.Additionally,they typically rely on honest but curious cloud servers,which introduces the risk of repudiation.Furthermore,the combined operations of search and verification increase system load,thereby reducing performance.Traditional verification mechanisms,which rely on complex hash constructions,suffer from low verification efficiency.To address these challenges,this paper proposes a blockchain-based contextual semantic-aware ciphertext retrieval scheme with efficient verification.Building on existing single and multi-keyword search methods,the scheme uses vector models to semantically train the dataset,enabling it to retain semantic information and achieve context-aware encrypted retrieval,significantly improving search accuracy.Additionally,a blockchain-based updatable master-slave chain storage model is designed,where the master chain stores encrypted keyword indexes and the slave chain stores verification information generated by zero-knowledge proofs,thus balancing system load while improving search and verification efficiency.Finally,an improved non-interactive zero-knowledge proof mechanism is introduced,reducing the computational complexity of verification and ensuring efficient validation of search results.Experimental results demonstrate that the proposed scheme offers stronger security,balanced overhead,and higher search verification efficiency.
基金supported by the National Key Research and Development Program of China(2020AAA0109300)the Shanghai Collaborative Innovation Center of data intelligence technology(No.0232-A1-8900-24-13).
文摘Chinese abbreviations improve communicative efficiency by extracting key components from longer expressions.They are widely used in both daily communication and professional domains.However,existing abbreviation generation methods still face two major challenges.First,sequence-labeling-based approaches often neglect contextual meaning by making binary decisions at the character level,leading to abbreviations that fail to capture semantic completeness.Second,generation-basedmethods rely heavily on a single decoding process,which frequently produces correct abbreviations but ranks them lower due to inadequate semantic evaluation.To address these limitations,we propose a novel two-stage frameworkwithGeneration–Iterative Optimization forAbbreviation(GIOA).In the first stage,we design aChain-of-Thought prompting strategy and incorporate definitional and situational contexts to generate multiple abbreviation candidates.In the second stage,we introduce a Semantic Preservation Dynamic Adjustment mechanism that alternates between character-level importance estimation and semantic restoration to optimize candidate ranking.Experiments on two public benchmark datasets show that our method outperforms existing state-of-the-art approaches,achieving Hit@1 improvements of 15.15%and 13.01%,respectively,while maintaining consistent results in Hit@3.
基金provided by the Science Research Project of Hebei Education Department under grant No.BJK2024115.
文摘High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.
基金supported by the National Natural Science Foundation of China[grant number 62376217]the Young Elite Scientists Sponsorship Program by CAST[grant number 2023QNRC001]the Joint Research Project for Meteorological Capacity Improvement[grant number 24NLTSZ003]。
文摘Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep learning to medium-range regional weather forecasting with limited data remains a significant challenge.In this work,three key solutions are proposed:(1)motivated by the need to improve model performance in data-scarce regional forecasting scenarios,the authors innovatively apply semantic segmentation models,to better capture spatiotemporal features and improve prediction accuracy;(2)recognizing the challenge of overfitting and the inability of traditional noise-based data augmentation methods to effectively enhance model robustness,a novel learnable Gaussian noise mechanism is introduced that allows the model to adaptively optimize perturbations for different locations,ensuring more effective learning;and(3)to address the issue of error accumulation in autoregressive prediction,as well as the challenge of learning difficulty and the lack of intermediate data utilization in one-shot prediction,the authors propose a cascade prediction approach that effectively resolves these problems while significantly improving model forecasting performance.The method achieves a competitive result in The East China Regional AI Medium Range Weather Forecasting Competition.Ablation experiments further validate the effectiveness of each component,highlighting their contributions to enhancing prediction performance.
文摘[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis of the severity of tea spots was proposed in this research,designated as MDC-U-Net3+,to enhance segmentation accuracy on the base framework of U-Net3+.[Methods]Multi-scale feature fusion module(MSFFM)was incorporated into the backbone network of U-Net3+to obtain feature information across multiple receptive fields of diseased spots,thereby reducing the loss of features within the encoder.Dual multi-scale attention(DMSA)was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue.This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale.Furthermore,the segmented mask image was subjected to conditional random fields(CRF)to enhance the optimization of the segmentation results[Results and Discussions]The improved model MDC-U-Net3+achieved a mean pixel accuracy(mPA)of 94.92%,accompanied by a mean Intersection over Union(mIoU)ratio of 90.9%.When compared to the mPA and mIoU of U-Net3+,MDC-U-Net3+model showed improvements of 1.85 and 2.12 percentage points,respectively.These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models.[Conclusions]The methodology presented herein could provide data support for automated disease detection and precise medication,consequently reducing the losses associated with tea diseases.
基金supported by the Funds for Central-Guided Local Science and Technology Development(Grant No.202407AC110005)Key Technologies for the Construction of a Whole-Process Intelligent Service System for Neuroendocrine Neoplasm.Supported by 2023 Opening Research Fund of Yunnan Key Laboratory of Digital Communications(YNJTKFB-20230686,YNKLDC-KFKT-202304).
文摘In image analysis,high-precision semantic segmentation predominantly relies on supervised learning.Despite significant advancements driven by deep learning techniques,challenges such as class imbalance and dynamic performance evaluation persist.Traditional weighting methods,often based on pre-statistical class counting,tend to overemphasize certain classes while neglecting others,particularly rare sample categories.Approaches like focal loss and other rare-sample segmentation techniques introduce multiple hyperparameters that require manual tuning,leading to increased experimental costs due to their instability.This paper proposes a novel CAWASeg framework to address these limitations.Our approach leverages Grad-CAM technology to generate class activation maps,identifying key feature regions that the model focuses on during decision-making.We introduce a Comprehensive Segmentation Performance Score(CSPS)to dynamically evaluate model performance by converting these activation maps into pseudo mask and comparing them with Ground Truth.Additionally,we design two adaptive weights for each class:a Basic Weight(BW)and a Ratio Weight(RW),which the model adjusts during training based on real-time feedback.Extensive experiments on the COCO-Stuff,CityScapes,and ADE20k datasets demonstrate that our CAWASeg framework significantly improves segmentation performance for rare sample categories while enhancing overall segmentation accuracy.The proposed method offers a robust and efficient solution for addressing class imbalance in semantic segmentation tasks.
文摘Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural network(CNN)based methods,in which class activation mapping(CAM)is proposed to obtain the pseudo labels,and only concentrates on the most discriminative parts.Recently,transformer-based methods utilize attention map from the multi-headed self-attention(MHSA)module to predict pseudo labels,which usually contain obvious background noise and incoherent object area.To solve the above problems,we use the Conformer as our backbone,which is a parallel network based on convolutional neural network(CNN)and Transformer.The two branches generate pseudo labels and refine them independently,and can effectively combine the advantages of CNN and Transformer.However,the parallel structure is not close enough in the information communication.Thus,parallel structure can result in poor details about pseudo labels,and the background noise still exists.To alleviate this problem,we propose enhancing convolution CAM(ECCAM)model,which have three improved modules based on enhancing convolution,including deeper stem(DStem),convolutional feed-forward network(CFFN)and feature coupling unit with convolution(FCUConv).The ECCAM could make Conformer have tighter interaction between CNN and Transformer branches.After experimental verification,the improved modules we propose can help the network perceive more local information from images,making the final segmentation results more refined.Compared with similar architecture,our modules greatly improve the semantic segmentation performance and achieve70.2%mean intersection over union(mIoU)on the PASCAL VOC 2012 dataset.
基金National Natural Science Foundation of China(Nos.42301473,42271424,42171397)Chinese Postdoctoral Innovation Talents Support Program(No.BX20230299)+2 种基金China Postdoctoral Science Foundation(No.2023M742884)Natural Science Foundation of Sichuan Province(Nos.24NSFSC2264,2025ZNSFSC0322)Key Research and Development Project of Sichuan Province(No.24ZDYF0633).
文摘As a key node of modern transportation network,the informationization management of road tunnels is crucial to ensure the operation safety and traffic efficiency.However,the existing tunnel vehicle modeling methods generally have problems such as insufficient 3D scene description capability and low dynamic update efficiency,which are difficult to meet the demand of real-time accurate management.For this reason,this paper proposes a vehicle twin modeling method for road tunnels.This approach starts from the actual management needs,and supports multi-level dynamic modeling from vehicle type,size to color by constructing a vehicle model library that can be flexibly invoked;at the same time,semantic constraint rules with geometric layout,behavioral attributes,and spatial relationships are designed to ensure that the virtual model matches with the real model with a high degree of similarity;ultimately,the prototype system is constructed and the case region is selected for the case study,and the dynamic vehicle status in the tunnel is realized by integrating real-time monitoring data with semantic constraints for precise virtual-real mapping.Finally,the prototype system is constructed and case experiments are conducted in selected case areas,which are combined with real-time monitoring data to realize dynamic updating and three-dimensional visualization of vehicle states in tunnels.The experiments show that the proposed method can run smoothly with an average rendering efficiency of 17.70 ms while guaranteeing the modeling accuracy(composite similarity of 0.867),which significantly improves the real-time and intuitive tunnel management.The research results provide reliable technical support for intelligent operation and emergency response of road tunnels,and offer new ideas for digital twin modeling of complex scenes.
基金supported by National Natural Science Foundation of China(Grant Nos.52305440,52204263)Natural Science Foundation of Changsha City(Grant Nos.kq2208272,kq2208274)+1 种基金Tribology Science Fund of the State Key Laboratory of Tribology in Advanced Equipment(Grant SKLTKF22B09)National Key Research and Development Program of China(2022YFB3706902).
文摘Laser-directed energy deposition(L-DED)is an advanced additive manufacturing technology primarily adopted in metal three-dimensional printing systems.The L-DED process is characterized by various defects,thus necessitating the extensive use of in-situ monitoring to enable real-time adjustments of process parameters by detecting molten-pool features.To address the challenge of accurately extracting the molten-pool morphology from an undetached spatter,an innovative monitoring method based on the U-Net(U-shaped network)is proposed herein.A lightweight architecture accelerates the processing speed,whereas an enhanced loss function incorporating weight maps augments the segmentation precision.The model performance is evaluated by comparing its segmentation accuracy and processing speed with those of the conventional U-Net,using the mean intersection over union(MIoU)as the segmentation metric.The improved model demonstrates superior segmentation accuracy at the interface between the molten pool and spatter,with a peak MIoU of 0.9798 achieved on the test set.Furthermore,this model processes each image in an extremely short time of 17.9 ms.Using this segmentation algorithm,the error in extracting the molten-pool width from single-track experiments is within 0.1 mm.The proposed method for monitoring the molten-pool morphology is suitable for deployment in online monitoring systems,thus providing a foundation for subsequent process-parameter regulation.
基金funded by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(grant number 22KJD440001)Changzhou Science&Technology Program(grant number CJ20220232).
文摘The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology plays a crucial role in vehicle localization and navigation. Traditional Simultaneous Localization and Mapping (SLAM) systems are designed for use in static environments, and they can result in poor performance in terms of accuracy and robustness when used in dynamic environments where objects are in constant movement. To address this issue, a new real-time visual SLAM system called MG-SLAM has been developed. Based on ORB-SLAM2, MG-SLAM incorporates a dynamic target detection process that enables the detection of both known and unknown moving objects. In this process, a separate semantic segmentation thread is required to segment dynamic target instances, and the Mask R-CNN algorithm is applied on the Graphics Processing Unit (GPU) to accelerate segmentation. To reduce computational cost, only key frames are segmented to identify known dynamic objects. Additionally, a multi-view geometry method is adopted to detect unknown moving objects. The results demonstrate that MG-SLAM achieves higher precision, with an improvement from 0.2730 m to 0.0135 m in precision. Moreover, the processing time required by MG-SLAM is significantly reduced compared to other dynamic scene SLAM algorithms, which illustrates its efficacy in locating objects in dynamic scenes.
基金the National Natural Science Foundation of China(42472194,42302153,and 42002144)the Fundamental Research Funds for the Central Univer-sities(22CX06002A).
文摘Karst fractures serve as crucial seepage channels and storage spaces for carbonate natural gas reservoirs,and electrical image logs are vital data for visualizing and characterizing such fractures.However,the conventional approach of identifying fractures using electrical image logs predominantly relies on manual processes that are not only time-consuming but also highly subjective.In addition,the heterogeneity and strong dissolution tendency of karst carbonate reservoirs lead to complexity and variety in fracture geometry,which makes it difficult to accurately identify fractures.In this paper,the electrical image logs network(EILnet)da deep-learning-based intelligent semantic segmentation model with a selective attention mechanism and selective feature fusion moduledwas created to enable the intelligent identification and segmentation of different types of fractures through electrical logging images.Data from electrical image logs representing structural and induced fractures were first selected using the sliding window technique before image inpainting and data augmentation were implemented for these images to improve the generalizability of the model.Various image-processing tools,including the bilateral filter,Laplace operator,and Gaussian low-pass filter,were also applied to the electrical logging images to generate a multi-attribute dataset to help the model learn the semantic features of the fractures.The results demonstrated that the EILnet model outperforms mainstream deep-learning semantic segmentation models,such as Fully Convolutional Networks(FCN-8s),U-Net,and SegNet,for both the single-channel dataset and the multi-attribute dataset.The EILnet provided significant advantages for the single-channel dataset,and its mean intersection over union(MIoU)and pixel accuracy(PA)were 81.32%and 89.37%,respectively.In the case of the multi-attribute dataset,the identification capability of all models improved to varying degrees,with the EILnet achieving the highest MIoU and PA of 83.43%and 91.11%,respectively.Further,applying the EILnet model to various blind wells demonstrated its ability to provide reliable fracture identification,thereby indicating its promising potential applications.
基金Australian Research Council Project(FL-170100117).
文摘To avoid the laborious annotation process for dense prediction tasks like semantic segmentation,unsupervised domain adaptation(UDA)methods have been proposed to leverage the abundant annotations from a source domain,such as virtual world(e.g.,3D games),and adapt models to the target domain(the real world)by narrowing the domain discrepancies.However,because of the large domain gap,directly aligning two distinct domains without considering the intermediates leads to inefficient alignment and inferior adaptation.To address this issue,we propose a novel learnable evolutionary Category Intermediates(CIs)guided UDA model named Leci,which enables the information transfer between the two domains via two processes,i.e.,Distilling and Blending.Starting from a random initialization,the CIs learn shared category-wise semantics automatically from two domains in the Distilling process.Then,the learned semantics in the CIs are sent back to blend the domain features through a residual attentive fusion(RAF)module,such that the categorywise features of both domains shift towards each other.As the CIs progressively and consistently learn from the varying feature distributions during training,they are evolutionary to guide the model to achieve category-wise feature alignment.Experiments on both GTA5 and SYNTHIA datasets demonstrate Leci's superiority over prior representative methods.
基金Shenzhen Institute of Artificial Intelligence and Robotics for Society,Grant/Award Number:AC01202201003-02GuangDong Basic and Applied Basic Research Foundation,Grant/Award Number:2024A1515010252Longgang District Shenzhen's“Ten Action Plan”for Supporting Innovation Projects,Grant/Award Number:LGKCSDPT2024002。
文摘Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.
基金supported by the project“GEF9874:Strengthening Coordinated Approaches to Reduce Invasive Alien Species(lAS)Threats to Globally Significant Agrobiodiversity and Agroecosystems in China”funding from the Excellent Talent Training Funding Project in Dongcheng District,Beijing,with project number 2024-dchrcpyzz-9.
文摘Ecological monitoring vehicles are equipped with a range of sensors and monitoring devices designed to gather data on ecological and environmental factors.These vehicles are crucial in various fields,including environmental science research,ecological and environmental monitoring projects,disaster response,and emergency management.A key method employed in these vehicles for achieving high-precision positioning is LiDAR(lightlaser detection and ranging)-Visual Simultaneous Localization and Mapping(SLAM).However,maintaining highprecision localization in complex scenarios,such as degraded environments or when dynamic objects are present,remains a significant challenge.To address this issue,we integrate both semantic and texture information from LiDAR and cameras to enhance the robustness and efficiency of data registration.Specifically,semantic information simplifies the modeling of scene elements,reducing the reliance on dense point clouds,which can be less efficient.Meanwhile,visual texture information complements LiDAR-Visual localization by providing additional contextual details.By incorporating semantic and texture details frompaired images and point clouds,we significantly improve the quality of data association,thereby increasing the success rate of localization.This approach not only enhances the operational capabilities of ecological monitoring vehicles in complex environments but also contributes to improving the overall efficiency and effectiveness of ecological monitoring and environmental protection efforts.
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.