The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack...The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.展开更多
In response to challenges posed by complex backgrounds,diverse target angles,and numerous small targets in remote sensing images,alongside the issue of high resource consumption hindering model deployment,we propose a...In response to challenges posed by complex backgrounds,diverse target angles,and numerous small targets in remote sensing images,alongside the issue of high resource consumption hindering model deployment,we propose an enhanced,lightweight you only look once version 8 small(YOLOv8s)detection algorithm.Regarding network improvements,we first replace tradi-tional horizontal boxes with rotated boxes for target detection,effectively addressing difficulties in feature extraction caused by varying target angles.Second,we design a module integrating convolu-tional neural networks(CNN)and Transformer components to replace specific C2f modules in the backbone network,thereby expanding the model’s receptive field and enhancing feature extraction in complex backgrounds.Finally,we introduce a feature calibration structure to mitigate potential feature mismatches during feature fusion.For model compression,we employ a lightweight channel pruning technique based on localized mean average precision(LMAP)to eliminate redundancies in the enhanced model.Although this approach results in some loss of detection accuracy,it effec-tively reduces the number of parameters,computational load,and model size.Additionally,we employ channel-level knowledge distillation to recover accuracy in the pruned model,further enhancing detection performance.Experimental results indicate that the enhanced algorithm achieves a 6.1%increase in mAP50 compared to YOLOv8s,while simultaneously reducing parame-ters,computational load,and model size by 57.7%,28.8%,and 52.3%,respectively.展开更多
Remote sensing image super-resolution technology is pivotal for enhancing image quality in critical applications including environmental monitoring,urban planning,and disaster assessment.However,traditional methods ex...Remote sensing image super-resolution technology is pivotal for enhancing image quality in critical applications including environmental monitoring,urban planning,and disaster assessment.However,traditional methods exhibit deficiencies in detail recovery and noise suppression,particularly when processing complex landscapes(e.g.,forests,farmlands),leading to artifacts and spectral distortions that limit practical utility.To address this,we propose an enhanced Super-Resolution Generative Adversarial Network(SRGAN)framework featuring three key innovations:(1)Replacement of L1/L2 loss with a robust Charbonnier loss to suppress noise while preserving edge details via adaptive gradient balancing;(2)A multi-loss joint optimization strategy dynamically weighting Charbonnier loss(β=0.5),Visual Geometry Group(VGG)perceptual loss(α=1),and adversarial loss(γ=0.1)to synergize pixel-level accuracy and perceptual quality;(3)A multi-scale residual network(MSRN)capturing cross-scale texture features(e.g.,forest canopies,mountain contours).Validated on Sentinel-2(10 m)and SPOT-6/7(2.5 m)datasets covering 904 km2 in Motuo County,Xizang,our method outperforms the SRGAN baseline(SR4RS)with Peak Signal-to-Noise Ratio(PSNR)gains of 0.29 dB and Structural Similarity Index(SSIM)improvements of 3.08%on forest imagery.Visual comparisons confirm enhanced texture continuity despite marginal Learned Perceptual Image Patch Similarity(LPIPS)increases.The method significantly improves noise robustness and edge retention in complex geomorphology,demonstrating 18%faster response in forest fire early warning and providing high-resolution support for agricultural/urban monitoring.Future work will integrate spectral constraints and lightweight architectures.展开更多
High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes an...High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.展开更多
The large-scale acquisition and widespread application of remote sensing image data have led to increasingly severe challenges in information security and privacy protection during transmission and storage.Urban remot...The large-scale acquisition and widespread application of remote sensing image data have led to increasingly severe challenges in information security and privacy protection during transmission and storage.Urban remote sensing image,characterized by complex content and well-defined structures,are particularly vulnerable to malicious attacks and information leakage.To address this issue,the author proposes an encryption method based on the enhanced single-neuron dynamical system(ESNDS).ESNDS generates highquality pseudo-random sequences with complex dynamics and intense sensitivity to initial conditions,which drive a structure of multi-stage cipher comprising permutation,ring-wise diffusion,and mask perturbation.Using representative GF-2 Panchromatic and Multispectral Scanner(PMS)urban scenes,the author conducts systematic evaluations in terms of inter-pixel correlation,information entropy,histogram uniformity,and number of pixel change rate(NPCR)/unified average changing intensity(UACI).The results demonstrate that the proposed scheme effectively resists statistical analysis,differential attacks,and known-plaintext attacks while maintaining competitive computational efficiency for high-resolution urban image.In addition,the cipher is lightweight and hardware-friendly,integrates readily with on-board and ground processing,and thus offers tangible engineering utility for real-time,large-volume remote-sensing data protection.展开更多
Semantic segmentation provides important technical support for Land cover/land use(LCLU)research.By calculating the cosine similarity between feature vectors,transformer-based models can effectively capture the global...Semantic segmentation provides important technical support for Land cover/land use(LCLU)research.By calculating the cosine similarity between feature vectors,transformer-based models can effectively capture the global information of high-resolution remote sensing images.However,the diversity of detailed and edge features within the same class of ground objects in high-resolution remote sensing images leads to a dispersed embedding distribution.The dispersed feature distribution enlarges feature vector angles and reduces cosine similarity,weakening the attention mechanism’s ability to identify the same class of ground objects.To address this challenge,remote sensing image information granulation transformer for semantic segmentation is proposed.The model employs adaptive granulation to extract common semantic features among objects of the same class,constructing an information granule to replace the detailed feature representation of these objects.Then,the Laplacian operator of the information granule is applied to extract the edge features of the object as represented by the information granule.In the experiments,the proposed model was validated on the Beijing Land-Use(BLU),Gaofen Image Dataset(GID),and Potsdam Dataset(PD).In particular,the model achieves 88.81%for mOA,82.64%for mF1,and 71.50%for mIoU metrics on the GID dataset.Experimental results show that the model effectively handles high-resolution remote sensing images.Our code is available at https://github.com/sjmp525/RSIGT(accessed on 16 April 2025).展开更多
The objective of this study is to address semantic misalignment and insufficient accuracy in edge detail and discrimination detection,which are common issues in deep learning-based change detection methods relying on ...The objective of this study is to address semantic misalignment and insufficient accuracy in edge detail and discrimination detection,which are common issues in deep learning-based change detection methods relying on encoding and decoding frameworks.In response to this,we propose a model called FlowDual-PixelClsObjectMec(FPCNet),which innovatively incorporates dual flow alignment technology in the decoding stage to rectify semantic discrepancies through streamlined feature correction fusion.Furthermore,the model employs an object-level similarity measurement coupled with pixel-level classification in the PixelClsObjectMec(PCOM)module during the final discrimination stage,significantly enhancing edge detail detection and overall accuracy.Experimental evaluations on the change detection dataset(CDD)and building CDD demonstrate superior performance,with F1 scores of 95.1%and 92.8%,respectively.Our findings indicate that the FPCNet outperforms the existing algorithms in stability,robustness,and other key metrics.展开更多
This paper introduces a lightweight remote sensing image dehazing network called multidimensional weight regulation network(MDWR-Net), which addresses the high computational cost of existing methods. Previous works, o...This paper introduces a lightweight remote sensing image dehazing network called multidimensional weight regulation network(MDWR-Net), which addresses the high computational cost of existing methods. Previous works, often based on the encoder-decoder structure and utilizing multiple upsampling and downsampling layers, are computationally expensive. To improve efficiency, the paper proposes two modules: the efficient spatial resolution recovery module(ESRR) for upsampling and the efficient depth information augmentation module(EDIA) for downsampling.These modules not only reduce model complexity but also enhance performance. Additionally, the partial feature weight learning module(PFWL) is introduced to reduce the computational burden by applying weight learning across partial dimensions, rather than using full-channel convolution.To overcome the limitations of convolutional neural networks(CNN)-based networks, the haze distribution index transformer(HDIT) is integrated into the decoder. We also propose the physicalbased non-adjacent feature fusion module(PNFF), which leverages the atmospheric scattering model to improve generalization of our MDWR-Net. The MDWR-Net achieves superior dehazing performance with a computational cost of just 2.98×10^(9) multiply-accumulate operations(MACs),which is less than one-tenth of previous methods. Experimental results validate its effectiveness in balancing performance and computational efficiency.展开更多
The classification of Chinese traditional settlements(CTSs)is extremely important for their differentiated development and protection.The innovative double-branch classification model developed in this study comprehen...The classification of Chinese traditional settlements(CTSs)is extremely important for their differentiated development and protection.The innovative double-branch classification model developed in this study comprehensively utilized the features of remote sensing(RS)images and building facade pictures(BFPs).This approach was able to overcome the limitations of previous methods that used only building facade images to classify settlements.First,the features of the roofs and walls were extracted using a double-branch structure,which consisted of an RS image branch and BFP branch.Then,a feature fusion module was designed to fuse the features of the roofs and walls.The precision,recall,and F1-score of the proposed model were improved by more than 4%compared with the classification model using only RS images or BFPs.The same three indexes of the proposed model were improved by more than 2%compared with other deep learning models.The results demonstrated that the proposed model performed well in the classification of architectural styles in CTSs.展开更多
The frequent occurrence of extreme weather events has rendered numerous landslides to a global natural disaster issue.It is crucial to rapidly and accurately determine the boundaries of landslides for geohazards evalu...The frequent occurrence of extreme weather events has rendered numerous landslides to a global natural disaster issue.It is crucial to rapidly and accurately determine the boundaries of landslides for geohazards evaluation and emergency response.Therefore,the Skip Connection DeepLab neural network(SCDnn),a deep learning model based on 770 optical remote sensing images of landslide,is proposed to improve the accuracy of landslide boundary detection.The SCDnn model is optimized for the over-segmentation issue which occurs in conventional deep learning models when there is a significant degree of similarity between topographical geomorphic features.SCDnn exhibits notable improvements in landslide feature extraction and semantic segmentation by combining an enhanced Atrous Spatial Pyramid Convolutional Block(ASPC)with a coding structure that reduces model complexity.The experimental results demonstrate that SCDnn can identify landslide boundaries in 119 images with MIoU values between 0.8and 0.9;while 52 images with MIoU values exceeding 0.9,which exceeds the identification accuracy of existing techniques.This work can offer a novel technique for the automatic extensive identification of landslide boundaries in remote sensing images in addition to establishing the groundwork for future inve stigations and applications in related domains.展开更多
Remote sensing images carry crucial ground information,often involving the spatial distribution and spatiotemporal changes of surface elements.To safeguard this sensitive data,image encryption technology is essential....Remote sensing images carry crucial ground information,often involving the spatial distribution and spatiotemporal changes of surface elements.To safeguard this sensitive data,image encryption technology is essential.In this paper,a novel Fibonacci sine exponential map is designed,the hyperchaotic performance of which is particularly suitable for image encryption algorithms.An encryption algorithm tailored for handling the multi-band attributes of remote sensing images is proposed.The algorithm combines a three-dimensional synchronized scrambled diffusion operation with chaos to efficiently encrypt multiple images.Moreover,the keys are processed using an elliptic curve cryptosystem,eliminating the need for an additional channel to transmit the keys,thus enhancing security.Experimental results and algorithm analysis demonstrate that the algorithm offers strong security and high efficiency,making it suitable for remote sensing image encryption tasks.展开更多
Remote sensing image object detection is one of the core tasks of remote sensing image processing.In recent years,with the development of deep learning,great progress has been made in object detection in remote sensin...Remote sensing image object detection is one of the core tasks of remote sensing image processing.In recent years,with the development of deep learning,great progress has been made in object detection in remote sensing.However,the problems of dense small targets,complex backgrounds and poor target positioning accuracy in remote sensing images make the detection of remote sensing targets still difficult.In order to solve these problems,this research proposes a remote sensing image object detection algorithm based on improved YOLOX-S.Firstly,the Efficient Channel Attention(ECA)module is introduced to improve the network's ability to extract features in the image and suppress useless information such as background;Secondly,the loss function is optimized to improve the regression accuracy of the target bounding box.We evaluate the effectiveness of our algorithm on the NWPU VHR-10 remote sensing image dataset,the experimental results show that the detection accuracy of the algorithm can reach 95.5%,without increasing the amount of parameters.It is significantly improved compared with that of the original YOLOX-S network,and the detection performance is much better than that of some other mainstream remote sensing image detection methods.Besides,our method also shows good generalization detection performance in experiments on aircraft images in the RSOD dataset.展开更多
Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous human...Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.展开更多
The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivot...The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivotal components of image preprocessing,fostering an improvement in the quality of remote sensing imagery.This enhancement renders remote sensing data more indispensable,thereby enhancing the accuracy of target iden-tification.Conventional defogging techniques based on simplistic atmospheric degradation models have proven inadequate for mitigating non-uniform haze within remotely sensed images.In response to this challenge,a novel UNet Residual Attention Network(URA-Net)is proposed.This paradigmatic approach materializes as an end-to-end convolutional neural network distinguished by its utilization of multi-scale dense feature fusion clusters and gated jump connections.The essence of our methodology lies in local feature fusion within dense residual clusters,enabling the extraction of pertinent features from both preceding and current local data,depending on contextual demands.The intelligently orchestrated gated structures facilitate the propagation of these features to the decoder,resulting in superior outcomes in haze removal.Empirical validation through a plethora of experiments substantiates the efficacy of URA-Net,demonstrating its superior performance compared to existing methods when applied to established datasets for remote sensing image defogging.On the RICE-1 dataset,URA-Net achieves a Peak Signal-to-Noise Ratio(PSNR)of 29.07 dB,surpassing the Dark Channel Prior(DCP)by 11.17 dB,the All-in-One Network for Dehazing(AOD)by 7.82 dB,the Optimal Transmission Map and Adaptive Atmospheric Light For Dehazing(OTM-AAL)by 5.37 dB,the Unsupervised Single Image Dehazing(USID)by 8.0 dB,and the Superpixel-based Remote Sensing Image Dehazing(SRD)by 8.5 dB.Particularly noteworthy,on the SateHaze1k dataset,URA-Net attains preeminence in overall performance,yielding defogged images characterized by consistent visual quality.This underscores the contribution of the research to the advancement of remote sensing technology,providing a robust and efficient solution for alleviating the adverse effects of haze on image quality.展开更多
Extracting building contours from aerial images is a fundamental task in remote sensing.Current building extraction methods cannot accurately extract building contour information and have errors in extracting small-sc...Extracting building contours from aerial images is a fundamental task in remote sensing.Current building extraction methods cannot accurately extract building contour information and have errors in extracting small-scale buildings.This paper introduces a novel dense feature iterative(DFI)fusion network,denoted as DFINet,for extracting building contours.The network uses a DFI decoder to fuse semantic information at different scales and learns the building contour knowledge,producing the last features through iterative fusion.The dense feature fusion(DFF)module combines features at multiple scales.We employ the contour reconstruction(CR)module to access the final predictions.Extensive experiments validate the effectiveness of the DFINet on two different remote sensing datasets,INRIA aerial image dataset and Wuhan University(WHU)building dataset.On the INRIA aerial image dataset,our method achieves the highest intersection over union(IoU),overall accuracy(OA)and F 1 scores compared to other state-of-the-art methods.展开更多
Information on Land Use and Land Cover Map(LULCM)is essential for environment and socioeconomic applications.Such maps are generally derived from Multispectral Remote Sensing Images(MRSI)via classification.The classif...Information on Land Use and Land Cover Map(LULCM)is essential for environment and socioeconomic applications.Such maps are generally derived from Multispectral Remote Sensing Images(MRSI)via classification.The classification process can be described as information flow from images to maps through a trained classifier.Characterizing the information flow is essential for understanding the classification mechanism,providing solutions that address such theoretical issues as“what is the maximum number of classes that can be classified from a given MRSI?”and“how much information gain can be obtained?”Consequently,two interesting questions naturally arise,i.e.(i)How can we characterize the information flow?and(ii)What is the mathematical form of the information flow?To answer these two questions,this study first hypothesizes that thermodynamic entropy is the appropriate measure of information for both MRSI and LULCM.This hypothesis is then supported by kinetic-theory-based experiments.Thereafter,upon such an entropy,a generalized Jarzynski equation is formulated to mathematically model the information flow,which contains such parameters as thermodynamic entropy of MRSI,thermodynamic entropy of LULCM,weighted F1-score(classification accuracy),and total number of classes.This generalized Jarzynski equation has been successfully validated by hypothesis-driven experiments where 694 Sentinel-2 images are classified into 10 classes by four classical classifiers.This study provides a way for linking thermodynamic laws and concepts to the characterization and understanding of information flow in land cover classification,opening a new door for constructing domain knowledge.展开更多
This study was to estabIish the forest resources management information system for forest farms based on the B/S structural WebGIS with trial forest farm of Hunan Academy of Forestry as the research field, forest reso...This study was to estabIish the forest resources management information system for forest farms based on the B/S structural WebGIS with trial forest farm of Hunan Academy of Forestry as the research field, forest resources field survey da-ta, ETM+ remote sensing data and basic geographical information data as research material through the extraction of forest resource data in the forest farm, require-ment analysis on the system function and the estabIishment of required software and hardware environment, with the alm to realize the management, query, editing, analysis, statistics and other functions of forest resources information to manage the forest resources.展开更多
In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a visi...In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.展开更多
Landslides,collapses and cracks are the main types of geological hazards,which threaten the safety of human life and property at all times.In emergency surveying and mapping,it is timeconsuming and laborious to use th...Landslides,collapses and cracks are the main types of geological hazards,which threaten the safety of human life and property at all times.In emergency surveying and mapping,it is timeconsuming and laborious to use the method of field artificial investigation and recognition and using satellite image to identify ground hazards,there are some problems,such as time lag,low resolution,and difficult to select the map on demand.In this paper,a10 cm per pixel resolution photogrammetry of a geological hazard-prone area of Taohuagou,Shanxi Province,China is carried out by DJ 4 UAV.The digital orthophoto model(DOM),digital surface model(DSM) and three-dimensional point cloud model(3 DPCM) are generated in this region.The method of visual interpretation of cracks based on DOM(as main)-3 DPCM(as auxiliary) and landslide and collapse based on 3 DPCM(as main)-DOM and DSM(as auxiliary) are proposed.Based on the low altitude remote sensing image of UAV,the shape characteristics,geological characteristics and distribution of the identified hazards are analyzed.The results show that using UAV low altitude remote sensing image,the method of combination of main and auxiliary data can quickly and accurately identify landslide,collapse and crack,the accuracy of crack identification is 93%,and the accuracy of landslide and collapse identification is 100%.It mainly occurs in silty clay and mudstone geology and is greatly affected by slope foot excavation.This study can play a great role in the recognition of sudden hazards by low altitude remote sensing images of UAV.展开更多
Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propos...Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propose a Multi-Scale Fully Convolutional Network(MSFCN)with a multi-scale convolutional kernel as well as a Channel Attention Block(CAB)and a Global Pooling Module(GPM)in this paper to exploit discriminative representations from two-dimensional(2D)satellite images.Meanwhile,to explore the ability of the proposed MSFCN for spatio-temporal images,we expand our MSFCN to three-dimension using three-dimensional(3D)CNN,capable of harnessing each land cover category’s time series interac-tion from the reshaped spatio-temporal remote sensing images.To verify the effectiveness of the proposed MSFCN,we conduct experiments on two spatial datasets and two spatio-temporal datasets.The proposed MSFCN achieves 60.366%on the WHDLD dataset and 75.127%on the GID dataset in terms of mIoU index while the figures for two spatio-temporal datasets are 87.753%and 77.156%.Extensive comparative experiments and abla-tion studies demonstrate the effectiveness of the proposed MSFCN.展开更多
基金supported by National Natural Science Foundation of China(No.52374155)Anhui Provincial Natural Science Foundation(No.2308085 MF218).
文摘The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.
基金supported in part by the National Natural Foundation of China(Nos.52472334,U2368204)。
文摘In response to challenges posed by complex backgrounds,diverse target angles,and numerous small targets in remote sensing images,alongside the issue of high resource consumption hindering model deployment,we propose an enhanced,lightweight you only look once version 8 small(YOLOv8s)detection algorithm.Regarding network improvements,we first replace tradi-tional horizontal boxes with rotated boxes for target detection,effectively addressing difficulties in feature extraction caused by varying target angles.Second,we design a module integrating convolu-tional neural networks(CNN)and Transformer components to replace specific C2f modules in the backbone network,thereby expanding the model’s receptive field and enhancing feature extraction in complex backgrounds.Finally,we introduce a feature calibration structure to mitigate potential feature mismatches during feature fusion.For model compression,we employ a lightweight channel pruning technique based on localized mean average precision(LMAP)to eliminate redundancies in the enhanced model.Although this approach results in some loss of detection accuracy,it effec-tively reduces the number of parameters,computational load,and model size.Additionally,we employ channel-level knowledge distillation to recover accuracy in the pruned model,further enhancing detection performance.Experimental results indicate that the enhanced algorithm achieves a 6.1%increase in mAP50 compared to YOLOv8s,while simultaneously reducing parame-ters,computational load,and model size by 57.7%,28.8%,and 52.3%,respectively.
基金This study was supported by:Inner Mongolia Academy of Forestry Sciences Open Research Project(Grant No.KF2024MS03)The Project to Improve the Scientific Research Capacity of the Inner Mongolia Academy of Forestry Sciences(Grant No.2024NLTS04)The Innovation and Entrepreneurship Training Program for Undergraduates of Beijing Forestry University(Grant No.X202410022268).
文摘Remote sensing image super-resolution technology is pivotal for enhancing image quality in critical applications including environmental monitoring,urban planning,and disaster assessment.However,traditional methods exhibit deficiencies in detail recovery and noise suppression,particularly when processing complex landscapes(e.g.,forests,farmlands),leading to artifacts and spectral distortions that limit practical utility.To address this,we propose an enhanced Super-Resolution Generative Adversarial Network(SRGAN)framework featuring three key innovations:(1)Replacement of L1/L2 loss with a robust Charbonnier loss to suppress noise while preserving edge details via adaptive gradient balancing;(2)A multi-loss joint optimization strategy dynamically weighting Charbonnier loss(β=0.5),Visual Geometry Group(VGG)perceptual loss(α=1),and adversarial loss(γ=0.1)to synergize pixel-level accuracy and perceptual quality;(3)A multi-scale residual network(MSRN)capturing cross-scale texture features(e.g.,forest canopies,mountain contours).Validated on Sentinel-2(10 m)and SPOT-6/7(2.5 m)datasets covering 904 km2 in Motuo County,Xizang,our method outperforms the SRGAN baseline(SR4RS)with Peak Signal-to-Noise Ratio(PSNR)gains of 0.29 dB and Structural Similarity Index(SSIM)improvements of 3.08%on forest imagery.Visual comparisons confirm enhanced texture continuity despite marginal Learned Perceptual Image Patch Similarity(LPIPS)increases.The method significantly improves noise robustness and edge retention in complex geomorphology,demonstrating 18%faster response in forest fire early warning and providing high-resolution support for agricultural/urban monitoring.Future work will integrate spectral constraints and lightweight architectures.
基金provided by the Science Research Project of Hebei Education Department under grant No.BJK2024115.
文摘High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.
文摘The large-scale acquisition and widespread application of remote sensing image data have led to increasingly severe challenges in information security and privacy protection during transmission and storage.Urban remote sensing image,characterized by complex content and well-defined structures,are particularly vulnerable to malicious attacks and information leakage.To address this issue,the author proposes an encryption method based on the enhanced single-neuron dynamical system(ESNDS).ESNDS generates highquality pseudo-random sequences with complex dynamics and intense sensitivity to initial conditions,which drive a structure of multi-stage cipher comprising permutation,ring-wise diffusion,and mask perturbation.Using representative GF-2 Panchromatic and Multispectral Scanner(PMS)urban scenes,the author conducts systematic evaluations in terms of inter-pixel correlation,information entropy,histogram uniformity,and number of pixel change rate(NPCR)/unified average changing intensity(UACI).The results demonstrate that the proposed scheme effectively resists statistical analysis,differential attacks,and known-plaintext attacks while maintaining competitive computational efficiency for high-resolution urban image.In addition,the cipher is lightweight and hardware-friendly,integrates readily with on-board and ground processing,and thus offers tangible engineering utility for real-time,large-volume remote-sensing data protection.
基金supported by the National Natural Science Foundation of China(62462040)the Yunnan Fundamental Research Projects(202501AT070345)+2 种基金the Major Science and Technology Projects in Yunnan Province(202202AD080013)Sichuan Provincial Key Laboratory of Philosophy and Social Science Key Program on Language Intelligence Special Education(YYZN-2024-1)the Photosynthesis Fund Class A(ghfund202407010460).
文摘Semantic segmentation provides important technical support for Land cover/land use(LCLU)research.By calculating the cosine similarity between feature vectors,transformer-based models can effectively capture the global information of high-resolution remote sensing images.However,the diversity of detailed and edge features within the same class of ground objects in high-resolution remote sensing images leads to a dispersed embedding distribution.The dispersed feature distribution enlarges feature vector angles and reduces cosine similarity,weakening the attention mechanism’s ability to identify the same class of ground objects.To address this challenge,remote sensing image information granulation transformer for semantic segmentation is proposed.The model employs adaptive granulation to extract common semantic features among objects of the same class,constructing an information granule to replace the detailed feature representation of these objects.Then,the Laplacian operator of the information granule is applied to extract the edge features of the object as represented by the information granule.In the experiments,the proposed model was validated on the Beijing Land-Use(BLU),Gaofen Image Dataset(GID),and Potsdam Dataset(PD).In particular,the model achieves 88.81%for mOA,82.64%for mF1,and 71.50%for mIoU metrics on the GID dataset.Experimental results show that the model effectively handles high-resolution remote sensing images.Our code is available at https://github.com/sjmp525/RSIGT(accessed on 16 April 2025).
文摘The objective of this study is to address semantic misalignment and insufficient accuracy in edge detail and discrimination detection,which are common issues in deep learning-based change detection methods relying on encoding and decoding frameworks.In response to this,we propose a model called FlowDual-PixelClsObjectMec(FPCNet),which innovatively incorporates dual flow alignment technology in the decoding stage to rectify semantic discrepancies through streamlined feature correction fusion.Furthermore,the model employs an object-level similarity measurement coupled with pixel-level classification in the PixelClsObjectMec(PCOM)module during the final discrimination stage,significantly enhancing edge detail detection and overall accuracy.Experimental evaluations on the change detection dataset(CDD)and building CDD demonstrate superior performance,with F1 scores of 95.1%and 92.8%,respectively.Our findings indicate that the FPCNet outperforms the existing algorithms in stability,robustness,and other key metrics.
文摘This paper introduces a lightweight remote sensing image dehazing network called multidimensional weight regulation network(MDWR-Net), which addresses the high computational cost of existing methods. Previous works, often based on the encoder-decoder structure and utilizing multiple upsampling and downsampling layers, are computationally expensive. To improve efficiency, the paper proposes two modules: the efficient spatial resolution recovery module(ESRR) for upsampling and the efficient depth information augmentation module(EDIA) for downsampling.These modules not only reduce model complexity but also enhance performance. Additionally, the partial feature weight learning module(PFWL) is introduced to reduce the computational burden by applying weight learning across partial dimensions, rather than using full-channel convolution.To overcome the limitations of convolutional neural networks(CNN)-based networks, the haze distribution index transformer(HDIT) is integrated into the decoder. We also propose the physicalbased non-adjacent feature fusion module(PNFF), which leverages the atmospheric scattering model to improve generalization of our MDWR-Net. The MDWR-Net achieves superior dehazing performance with a computational cost of just 2.98×10^(9) multiply-accumulate operations(MACs),which is less than one-tenth of previous methods. Experimental results validate its effectiveness in balancing performance and computational efficiency.
基金The Science and Technology Project of Hebei Education Department,No.BJK2022031The Open Fund of Hebei Key Laboratory of Geological Resources and Environmental Monitoring and Protection,No.JCYKT202310。
文摘The classification of Chinese traditional settlements(CTSs)is extremely important for their differentiated development and protection.The innovative double-branch classification model developed in this study comprehensively utilized the features of remote sensing(RS)images and building facade pictures(BFPs).This approach was able to overcome the limitations of previous methods that used only building facade images to classify settlements.First,the features of the roofs and walls were extracted using a double-branch structure,which consisted of an RS image branch and BFP branch.Then,a feature fusion module was designed to fuse the features of the roofs and walls.The precision,recall,and F1-score of the proposed model were improved by more than 4%compared with the classification model using only RS images or BFPs.The same three indexes of the proposed model were improved by more than 2%compared with other deep learning models.The results demonstrated that the proposed model performed well in the classification of architectural styles in CTSs.
基金supported by the National Natural Science Foundation of China(Grant Nos.42090054,41931295)the Natural Science Foundation of Hubei Province of China(2022CFA002)。
文摘The frequent occurrence of extreme weather events has rendered numerous landslides to a global natural disaster issue.It is crucial to rapidly and accurately determine the boundaries of landslides for geohazards evaluation and emergency response.Therefore,the Skip Connection DeepLab neural network(SCDnn),a deep learning model based on 770 optical remote sensing images of landslide,is proposed to improve the accuracy of landslide boundary detection.The SCDnn model is optimized for the over-segmentation issue which occurs in conventional deep learning models when there is a significant degree of similarity between topographical geomorphic features.SCDnn exhibits notable improvements in landslide feature extraction and semantic segmentation by combining an enhanced Atrous Spatial Pyramid Convolutional Block(ASPC)with a coding structure that reduces model complexity.The experimental results demonstrate that SCDnn can identify landslide boundaries in 119 images with MIoU values between 0.8and 0.9;while 52 images with MIoU values exceeding 0.9,which exceeds the identification accuracy of existing techniques.This work can offer a novel technique for the automatic extensive identification of landslide boundaries in remote sensing images in addition to establishing the groundwork for future inve stigations and applications in related domains.
基金supported by the National Natural Science Foundation of China(Grant No.91948303)。
文摘Remote sensing images carry crucial ground information,often involving the spatial distribution and spatiotemporal changes of surface elements.To safeguard this sensitive data,image encryption technology is essential.In this paper,a novel Fibonacci sine exponential map is designed,the hyperchaotic performance of which is particularly suitable for image encryption algorithms.An encryption algorithm tailored for handling the multi-band attributes of remote sensing images is proposed.The algorithm combines a three-dimensional synchronized scrambled diffusion operation with chaos to efficiently encrypt multiple images.Moreover,the keys are processed using an elliptic curve cryptosystem,eliminating the need for an additional channel to transmit the keys,thus enhancing security.Experimental results and algorithm analysis demonstrate that the algorithm offers strong security and high efficiency,making it suitable for remote sensing image encryption tasks.
基金Supported by the National Natural Science Foundation of China (72174172, 71774134)the Fundamental Research Funds for Central University,Southwest Minzu University (2022NYXXS094)。
文摘Remote sensing image object detection is one of the core tasks of remote sensing image processing.In recent years,with the development of deep learning,great progress has been made in object detection in remote sensing.However,the problems of dense small targets,complex backgrounds and poor target positioning accuracy in remote sensing images make the detection of remote sensing targets still difficult.In order to solve these problems,this research proposes a remote sensing image object detection algorithm based on improved YOLOX-S.Firstly,the Efficient Channel Attention(ECA)module is introduced to improve the network's ability to extract features in the image and suppress useless information such as background;Secondly,the loss function is optimized to improve the regression accuracy of the target bounding box.We evaluate the effectiveness of our algorithm on the NWPU VHR-10 remote sensing image dataset,the experimental results show that the detection accuracy of the algorithm can reach 95.5%,without increasing the amount of parameters.It is significantly improved compared with that of the original YOLOX-S network,and the detection performance is much better than that of some other mainstream remote sensing image detection methods.Besides,our method also shows good generalization detection performance in experiments on aircraft images in the RSOD dataset.
基金the National Natural Science Foundation of China(42001408,61806097).
文摘Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.
基金This project is supported by the National Natural Science Foundation of China(NSFC)(No.61902158).
文摘The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivotal components of image preprocessing,fostering an improvement in the quality of remote sensing imagery.This enhancement renders remote sensing data more indispensable,thereby enhancing the accuracy of target iden-tification.Conventional defogging techniques based on simplistic atmospheric degradation models have proven inadequate for mitigating non-uniform haze within remotely sensed images.In response to this challenge,a novel UNet Residual Attention Network(URA-Net)is proposed.This paradigmatic approach materializes as an end-to-end convolutional neural network distinguished by its utilization of multi-scale dense feature fusion clusters and gated jump connections.The essence of our methodology lies in local feature fusion within dense residual clusters,enabling the extraction of pertinent features from both preceding and current local data,depending on contextual demands.The intelligently orchestrated gated structures facilitate the propagation of these features to the decoder,resulting in superior outcomes in haze removal.Empirical validation through a plethora of experiments substantiates the efficacy of URA-Net,demonstrating its superior performance compared to existing methods when applied to established datasets for remote sensing image defogging.On the RICE-1 dataset,URA-Net achieves a Peak Signal-to-Noise Ratio(PSNR)of 29.07 dB,surpassing the Dark Channel Prior(DCP)by 11.17 dB,the All-in-One Network for Dehazing(AOD)by 7.82 dB,the Optimal Transmission Map and Adaptive Atmospheric Light For Dehazing(OTM-AAL)by 5.37 dB,the Unsupervised Single Image Dehazing(USID)by 8.0 dB,and the Superpixel-based Remote Sensing Image Dehazing(SRD)by 8.5 dB.Particularly noteworthy,on the SateHaze1k dataset,URA-Net attains preeminence in overall performance,yielding defogged images characterized by consistent visual quality.This underscores the contribution of the research to the advancement of remote sensing technology,providing a robust and efficient solution for alleviating the adverse effects of haze on image quality.
基金National Natural Science Foundation of China(No.61903078)Fundamental Research Funds for the Central Universities,China(No.2232021A-10)+1 种基金Shanghai Sailing Program,China(No.22YF1401300)Natural Science Foundation of Shanghai,China(No.20ZR1400400)。
文摘Extracting building contours from aerial images is a fundamental task in remote sensing.Current building extraction methods cannot accurately extract building contour information and have errors in extracting small-scale buildings.This paper introduces a novel dense feature iterative(DFI)fusion network,denoted as DFINet,for extracting building contours.The network uses a DFI decoder to fuse semantic information at different scales and learns the building contour knowledge,producing the last features through iterative fusion.The dense feature fusion(DFF)module combines features at multiple scales.We employ the contour reconstruction(CR)module to access the final predictions.Extensive experiments validate the effectiveness of the DFINet on two different remote sensing datasets,INRIA aerial image dataset and Wuhan University(WHU)building dataset.On the INRIA aerial image dataset,our method achieves the highest intersection over union(IoU),overall accuracy(OA)and F 1 scores compared to other state-of-the-art methods.
基金supported by the National Natural Science Foundation of China[grant number 41930104]by the Research Grants Council of Hong Kong[grant number PolyU 152219/18E].
文摘Information on Land Use and Land Cover Map(LULCM)is essential for environment and socioeconomic applications.Such maps are generally derived from Multispectral Remote Sensing Images(MRSI)via classification.The classification process can be described as information flow from images to maps through a trained classifier.Characterizing the information flow is essential for understanding the classification mechanism,providing solutions that address such theoretical issues as“what is the maximum number of classes that can be classified from a given MRSI?”and“how much information gain can be obtained?”Consequently,two interesting questions naturally arise,i.e.(i)How can we characterize the information flow?and(ii)What is the mathematical form of the information flow?To answer these two questions,this study first hypothesizes that thermodynamic entropy is the appropriate measure of information for both MRSI and LULCM.This hypothesis is then supported by kinetic-theory-based experiments.Thereafter,upon such an entropy,a generalized Jarzynski equation is formulated to mathematically model the information flow,which contains such parameters as thermodynamic entropy of MRSI,thermodynamic entropy of LULCM,weighted F1-score(classification accuracy),and total number of classes.This generalized Jarzynski equation has been successfully validated by hypothesis-driven experiments where 694 Sentinel-2 images are classified into 10 classes by four classical classifiers.This study provides a way for linking thermodynamic laws and concepts to the characterization and understanding of information flow in land cover classification,opening a new door for constructing domain knowledge.
文摘This study was to estabIish the forest resources management information system for forest farms based on the B/S structural WebGIS with trial forest farm of Hunan Academy of Forestry as the research field, forest resources field survey da-ta, ETM+ remote sensing data and basic geographical information data as research material through the extraction of forest resource data in the forest farm, require-ment analysis on the system function and the estabIishment of required software and hardware environment, with the alm to realize the management, query, editing, analysis, statistics and other functions of forest resources information to manage the forest resources.
基金supported by the National Natural Science Foundation of China (61702528,61806212)。
文摘In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.
基金supported by the National Natural Science Foundation of China (Award Number: 51704205)Key R & D Plan projects in Shanxi Province of China (Award Number: 201803D31044)+1 种基金Education Department Natural Science Foundation in Guizhou of China (Award Number: KY (2017) 097)the High-Level Talents Fund of Guizhou University of Engineering Science (Award Number: G2015005)。
文摘Landslides,collapses and cracks are the main types of geological hazards,which threaten the safety of human life and property at all times.In emergency surveying and mapping,it is timeconsuming and laborious to use the method of field artificial investigation and recognition and using satellite image to identify ground hazards,there are some problems,such as time lag,low resolution,and difficult to select the map on demand.In this paper,a10 cm per pixel resolution photogrammetry of a geological hazard-prone area of Taohuagou,Shanxi Province,China is carried out by DJ 4 UAV.The digital orthophoto model(DOM),digital surface model(DSM) and three-dimensional point cloud model(3 DPCM) are generated in this region.The method of visual interpretation of cracks based on DOM(as main)-3 DPCM(as auxiliary) and landslide and collapse based on 3 DPCM(as main)-DOM and DSM(as auxiliary) are proposed.Based on the low altitude remote sensing image of UAV,the shape characteristics,geological characteristics and distribution of the identified hazards are analyzed.The results show that using UAV low altitude remote sensing image,the method of combination of main and auxiliary data can quickly and accurately identify landslide,collapse and crack,the accuracy of crack identification is 93%,and the accuracy of landslide and collapse identification is 100%.It mainly occurs in silty clay and mudstone geology and is greatly affected by slope foot excavation.This study can play a great role in the recognition of sudden hazards by low altitude remote sensing images of UAV.
基金supported by the National Natural Science Foundation of China[grant number 41671452].
文摘Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propose a Multi-Scale Fully Convolutional Network(MSFCN)with a multi-scale convolutional kernel as well as a Channel Attention Block(CAB)and a Global Pooling Module(GPM)in this paper to exploit discriminative representations from two-dimensional(2D)satellite images.Meanwhile,to explore the ability of the proposed MSFCN for spatio-temporal images,we expand our MSFCN to three-dimension using three-dimensional(3D)CNN,capable of harnessing each land cover category’s time series interac-tion from the reshaped spatio-temporal remote sensing images.To verify the effectiveness of the proposed MSFCN,we conduct experiments on two spatial datasets and two spatio-temporal datasets.The proposed MSFCN achieves 60.366%on the WHDLD dataset and 75.127%on the GID dataset in terms of mIoU index while the figures for two spatio-temporal datasets are 87.753%and 77.156%.Extensive comparative experiments and abla-tion studies demonstrate the effectiveness of the proposed MSFCN.