Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still st...Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.展开更多
The spatial offset of bridge has a significant impact on the safety,comfort,and durability of high-speed railway(HSR)operations,so it is crucial to rapidly and effectively detect the spatial offset of operational HSR ...The spatial offset of bridge has a significant impact on the safety,comfort,and durability of high-speed railway(HSR)operations,so it is crucial to rapidly and effectively detect the spatial offset of operational HSR bridges.Drive-by monitoring of bridge uneven settlement demonstrates significant potential due to its practicality,cost-effectiveness,and efficiency.However,existing drive-by methods for detecting bridge offset have limitations such as reliance on a single data source,low detection accuracy,and the inability to identify lateral deformations of bridges.This paper proposes a novel drive-by inspection method for spatial offset of HSR bridge based on multi-source data fusion of comprehensive inspection train.Firstly,dung beetle optimizer-variational mode decomposition was employed to achieve adaptive decomposition of non-stationary dynamic signals,and explore the hidden temporal relationships in the data.Subsequently,a long short-term memory neural network was developed to achieve feature fusion of multi-source signal and accurate prediction of spatial settlement of HSR bridge.A dataset of track irregularities and CRH380A high-speed train responses was generated using a 3D train-track-bridge interaction model,and the accuracy and effectiveness of the proposed hybrid deep learning model were numerically validated.Finally,the reliability of the proposed drive-by inspection method was further validated by analyzing the actual measurement data obtained from comprehensive inspection train.The research findings indicate that the proposed approach enables rapid and accurate detection of spatial offset in HSR bridge,ensuring the long-term operational safety of HSR bridges.展开更多
In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free...In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.展开更多
Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the backgroun...Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.展开更多
This research centers on structural health monitoring of bridges,a critical transportation infrastructure.Owing to the cumulative action of heavy vehicle loads,environmental variations,and material aging,bridge compon...This research centers on structural health monitoring of bridges,a critical transportation infrastructure.Owing to the cumulative action of heavy vehicle loads,environmental variations,and material aging,bridge components are prone to cracks and other defects,severely compromising structural safety and service life.Traditional inspection methods relying on manual visual assessment or vehicle-mounted sensors suffer from low efficiency,strong subjectivity,and high costs,while conventional image processing techniques and early deep learning models(e.g.,UNet,Faster R-CNN)still performinadequately in complex environments(e.g.,varying illumination,noise,false cracks)due to poor perception of fine cracks andmulti-scale features,limiting practical application.To address these challenges,this paper proposes CACNN-Net(CBAM-Augmented CNN),a novel dual-encoder architecture that innovatively couples a CNN for local detail extraction with a CBAM-Transformer for global context modeling.A key contribution is the dedicated Feature FusionModule(FFM),which strategically integratesmulti-scale features and focuses attention on crack regions while suppressing irrelevant noise.Experiments on bridge crack datasets demonstrate that CACNNNet achieves a precision of 77.6%,a recall of 79.4%,and an mIoU of 62.7%.These results significantly outperform several typical models(e.g.,UNet-ResNet34,Deeplabv3),confirming their superior accuracy and robust generalization,providing a high-precision automated solution for bridge crack detection and a novel network design paradigm for structural surface defect identification in complex scenarios,while future research may integrate physical features like depth information to advance intelligent infrastructure maintenance and digital twin management.展开更多
Image registration is an indispensable component in multi-source remote sensing image processing. In this paper, we put forward a remote sensing image registration method by including an improved multi-scale and multi...Image registration is an indispensable component in multi-source remote sensing image processing. In this paper, we put forward a remote sensing image registration method by including an improved multi-scale and multi-direction Harris algorithm and a novel compound feature. Multi-scale circle Gaussian combined invariant moments and multi-direction gray level co-occurrence matrix are extracted as features for image matching. The proposed algorithm is evaluated on numerous multi-source remote sensor images with noise and illumination changes. Extensive experimental studies prove that our proposed method is capable of receiving stable and even distribution of key points as well as obtaining robust and accurate correspondence matches. It is a promising scheme in multi-source remote sensing image registration.展开更多
Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targ...Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).展开更多
At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific a...At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.展开更多
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce...The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.展开更多
Benthic habitat mapping is an emerging discipline in the international marine field in recent years,providing an effective tool for marine spatial planning,marine ecological management,and decision-making applications...Benthic habitat mapping is an emerging discipline in the international marine field in recent years,providing an effective tool for marine spatial planning,marine ecological management,and decision-making applications.Seabed sediment classification is one of the main contents of seabed habitat mapping.In response to the impact of remote sensing imaging quality and the limitations of acoustic measurement range,where a single data source does not fully reflect the substrate type,we proposed a high-precision seabed habitat sediment classification method that integrates data from multiple sources.Based on WorldView-2 multi-spectral remote sensing image data and multibeam bathymetry data,constructed a random forests(RF)classifier with optimal feature selection.A seabed sediment classification experiment integrating optical remote sensing and acoustic remote sensing data was carried out in the shallow water area of Wuzhizhou Island,Hainan,South China.Different seabed sediment types,such as sand,seagrass,and coral reefs were effectively identified,with an overall classification accuracy of 92%.Experimental results show that RF matrix optimized by fusing multi-source remote sensing data for feature selection were better than the classification results of simple combinations of data sources,which improved the accuracy of seabed sediment classification.Therefore,the method proposed in this paper can be effectively applied to high-precision seabed sediment classification and habitat mapping around islands and reefs.展开更多
An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
In the article“A Lightweight Approach for Skin Lesion Detection through Optimal Features Fusion”by Khadija Manzoor,Fiaz Majeed,Ansar Siddique,Talha Meraj,Hafiz Tayyab Rauf,Mohammed A.El-Meligy,Mohamed Sharaf,Abd Ela...In the article“A Lightweight Approach for Skin Lesion Detection through Optimal Features Fusion”by Khadija Manzoor,Fiaz Majeed,Ansar Siddique,Talha Meraj,Hafiz Tayyab Rauf,Mohammed A.El-Meligy,Mohamed Sharaf,Abd Elatty E.Abd Elgawad Computers,Materials&Continua,2022,Vol.70,No.1,pp.1617–1630.DOI:10.32604/cmc.2022.018621,URL:https://www.techscience.com/cmc/v70n1/44361,there was an error regarding the affiliation for the author Hafiz Tayyab Rauf.Instead of“Centre for Smart Systems,AI and Cybersecurity,Staffordshire University,Stoke-on-Trent,UK”,the affiliation should be“Independent Researcher,Bradford,BD80HS,UK”.展开更多
Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it cha...Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely us...Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications.展开更多
Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure p...Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.展开更多
Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework...Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework for human gait classification in video sequences using deep learning(DL)fusion assisted and posterior probability-based moth flames optimization(MFO)is proposed.In the first step,the video frames are resized and finetuned by two pre-trained lightweight DL models,EfficientNetB0 and MobileNetV2.Both models are selected based on the top-5 accuracy and less number of parameters.Later,both models are trained through deep transfer learning and extracted deep features fused using a voting scheme.In the last step,the authors develop a posterior probabilitybased MFO feature selection algorithm to select the best features.The selected features are classified using several supervised learning methods.The CASIA-B publicly available dataset has been employed for the experimental process.On this dataset,the authors selected six angles such as 0°,18°,90°,108°,162°,and 180°and obtained an average accuracy of 96.9%,95.7%,86.8%,90.0%,95.1%,and 99.7%.Results demonstrate comparable improvement in accuracy and significantly minimize the computational time with recent state-of-the-art techniques.展开更多
In order to address the challenges encountered in visual navigation for asteroid landing using traditional point features,such as significant recognition and extraction errors,low computational efficiency,and limited ...In order to address the challenges encountered in visual navigation for asteroid landing using traditional point features,such as significant recognition and extraction errors,low computational efficiency,and limited navigation accuracy,a novel approach for multi-type fusion visual navigation is proposed.This method aims to overcome the limitations of single-type features and enhance navigation accuracy.Analytical criteria for selecting multi-type features are introduced,which simultaneously improve computational efficiency and system navigation accuracy.Concerning pose estimation,both absolute and relative pose estimation methods based on multi-type feature fusion are proposed,and multi-type feature normalization is established,which significantly improves system navigation accuracy and lays the groundwork for flexible application of joint absolute-relative estimation.The feasibility and effectiveness of the proposed method are validated through simulation experiments through 4769 Castalia.展开更多
Multi-source data fusion provides high-precision spatial situational awareness essential for analyzing granular urban social activities.This study used Shanghai’s catering industry as a case study,leveraging electron...Multi-source data fusion provides high-precision spatial situational awareness essential for analyzing granular urban social activities.This study used Shanghai’s catering industry as a case study,leveraging electronic reviews and consumer data sourced from third-party restaurant platforms collected in 2021.By performing weighted processing on two-dimensional point-of-interest(POI)data,clustering hotspots of high-dimensional restaurant data were identified.A hierarchical network of restaurant hotspots was constructed following the Central Place Theory(CPT)framework,while the Geo-Informatic Tupu method was employed to resolve the challenges posed by network deformation in multi-scale processes.These findings suggest the necessity of enhancing the spatial balance of Shanghai’s urban centers by moderately increasing the number and service capacity of suburban centers at the urban periphery.Such measures would contribute to a more optimized urban structure and facilitate the outward dispersion of comfort-oriented facilities such as the restaurant industry.At a finer spatial scale,the distribution of restaurant hotspots demonstrates a polycentric and symmetric spatial pattern,with a developmental trend radiating outward along the city’s ring roads.This trend can be attributed to the efforts of restaurants to establish connections with other urban functional spaces,leading to the reconfiguration of urban spaces,expansion of restaurant-dedicated land use,and the reorganization of associated commercial activities.The results validate the existence of a polycentric urban structure in Shanghai but also highlight the instability of the restaurant hotspot network during cross-scale transitions.展开更多
Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feat...Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.展开更多
基金supported by the National Natural Science Foundation of China(No.62276204)the Fundamental Research Funds for the Central Universities,China(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shaanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470)。
文摘Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.
基金sponsored by the National Natural Science Foundation of China(Grant No.52178100).
文摘The spatial offset of bridge has a significant impact on the safety,comfort,and durability of high-speed railway(HSR)operations,so it is crucial to rapidly and effectively detect the spatial offset of operational HSR bridges.Drive-by monitoring of bridge uneven settlement demonstrates significant potential due to its practicality,cost-effectiveness,and efficiency.However,existing drive-by methods for detecting bridge offset have limitations such as reliance on a single data source,low detection accuracy,and the inability to identify lateral deformations of bridges.This paper proposes a novel drive-by inspection method for spatial offset of HSR bridge based on multi-source data fusion of comprehensive inspection train.Firstly,dung beetle optimizer-variational mode decomposition was employed to achieve adaptive decomposition of non-stationary dynamic signals,and explore the hidden temporal relationships in the data.Subsequently,a long short-term memory neural network was developed to achieve feature fusion of multi-source signal and accurate prediction of spatial settlement of HSR bridge.A dataset of track irregularities and CRH380A high-speed train responses was generated using a 3D train-track-bridge interaction model,and the accuracy and effectiveness of the proposed hybrid deep learning model were numerically validated.Finally,the reliability of the proposed drive-by inspection method was further validated by analyzing the actual measurement data obtained from comprehensive inspection train.The research findings indicate that the proposed approach enables rapid and accurate detection of spatial offset in HSR bridge,ensuring the long-term operational safety of HSR bridges.
文摘In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.
基金financially supported byChongqingUniversity of Technology Graduate Innovation Foundation(Grant No.gzlcx20253267).
文摘Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.
基金supported by the National Natural Science Foundation of China(No.52308332)the General Scientific Research Project of the Education Department of Zhejiang Province(No.Y202455824).
文摘This research centers on structural health monitoring of bridges,a critical transportation infrastructure.Owing to the cumulative action of heavy vehicle loads,environmental variations,and material aging,bridge components are prone to cracks and other defects,severely compromising structural safety and service life.Traditional inspection methods relying on manual visual assessment or vehicle-mounted sensors suffer from low efficiency,strong subjectivity,and high costs,while conventional image processing techniques and early deep learning models(e.g.,UNet,Faster R-CNN)still performinadequately in complex environments(e.g.,varying illumination,noise,false cracks)due to poor perception of fine cracks andmulti-scale features,limiting practical application.To address these challenges,this paper proposes CACNN-Net(CBAM-Augmented CNN),a novel dual-encoder architecture that innovatively couples a CNN for local detail extraction with a CBAM-Transformer for global context modeling.A key contribution is the dedicated Feature FusionModule(FFM),which strategically integratesmulti-scale features and focuses attention on crack regions while suppressing irrelevant noise.Experiments on bridge crack datasets demonstrate that CACNNNet achieves a precision of 77.6%,a recall of 79.4%,and an mIoU of 62.7%.These results significantly outperform several typical models(e.g.,UNet-ResNet34,Deeplabv3),confirming their superior accuracy and robust generalization,providing a high-precision automated solution for bridge crack detection and a novel network design paradigm for structural surface defect identification in complex scenarios,while future research may integrate physical features like depth information to advance intelligent infrastructure maintenance and digital twin management.
基金supported by National Nature Science Foundation of China (Nos. 61462046 and 61762052)Natural Science Foundation of Jiangxi Province (Nos. 20161BAB202049 and 20161BAB204172)+2 种基金the Bidding Project of the Key Laboratory of Watershed Ecology and Geographical Environment Monitoring, NASG (Nos. WE2016003, WE2016013 and WE2016015)the Science and Technology Research Projects of Jiangxi Province Education Department (Nos. GJJ160741, GJJ170632 and GJJ170633)the Art Planning Project of Jiangxi Province (Nos. YG2016250 and YG2017381)
文摘Image registration is an indispensable component in multi-source remote sensing image processing. In this paper, we put forward a remote sensing image registration method by including an improved multi-scale and multi-direction Harris algorithm and a novel compound feature. Multi-scale circle Gaussian combined invariant moments and multi-direction gray level co-occurrence matrix are extracted as features for image matching. The proposed algorithm is evaluated on numerous multi-source remote sensor images with noise and illumination changes. Extensive experimental studies prove that our proposed method is capable of receiving stable and even distribution of key points as well as obtaining robust and accurate correspondence matches. It is a promising scheme in multi-source remote sensing image registration.
基金funded by the Hainan Province Science and Technology Special Fund under Grant ZDYF2024GXJS292.
文摘Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).
基金supported by the National Natural Science Foundation of China Grant(No.61972133)Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant(No.204200510021)the Key Research and Development Plan Special Project of Henan Province Grant(No.241111211400).
文摘At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.
文摘The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.
基金Supported by the National Natural Science Foundation of China(Nos.42376185,41876111)the Shandong Provincial Natural Science Foundation(No.ZR2023MD073)。
文摘Benthic habitat mapping is an emerging discipline in the international marine field in recent years,providing an effective tool for marine spatial planning,marine ecological management,and decision-making applications.Seabed sediment classification is one of the main contents of seabed habitat mapping.In response to the impact of remote sensing imaging quality and the limitations of acoustic measurement range,where a single data source does not fully reflect the substrate type,we proposed a high-precision seabed habitat sediment classification method that integrates data from multiple sources.Based on WorldView-2 multi-spectral remote sensing image data and multibeam bathymetry data,constructed a random forests(RF)classifier with optimal feature selection.A seabed sediment classification experiment integrating optical remote sensing and acoustic remote sensing data was carried out in the shallow water area of Wuzhizhou Island,Hainan,South China.Different seabed sediment types,such as sand,seagrass,and coral reefs were effectively identified,with an overall classification accuracy of 92%.Experimental results show that RF matrix optimized by fusing multi-source remote sensing data for feature selection were better than the classification results of simple combinations of data sources,which improved the accuracy of seabed sediment classification.Therefore,the method proposed in this paper can be effectively applied to high-precision seabed sediment classification and habitat mapping around islands and reefs.
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.
文摘In the article“A Lightweight Approach for Skin Lesion Detection through Optimal Features Fusion”by Khadija Manzoor,Fiaz Majeed,Ansar Siddique,Talha Meraj,Hafiz Tayyab Rauf,Mohammed A.El-Meligy,Mohamed Sharaf,Abd Elatty E.Abd Elgawad Computers,Materials&Continua,2022,Vol.70,No.1,pp.1617–1630.DOI:10.32604/cmc.2022.018621,URL:https://www.techscience.com/cmc/v70n1/44361,there was an error regarding the affiliation for the author Hafiz Tayyab Rauf.Instead of“Centre for Smart Systems,AI and Cybersecurity,Staffordshire University,Stoke-on-Trent,UK”,the affiliation should be“Independent Researcher,Bradford,BD80HS,UK”.
基金supported in part by the National Natural Science Foundation of China under Grants 62463002,62062021 and 62473033in part by the Guiyang Scientific Plan Project[2023]48–11,in part by QKHZYD[2023]010 Guizhou Province Science and Technology Innovation Base Construction Project“Key Laboratory Construction of Intelligent Mountain Agricultural Equipment”.
文摘Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.
基金supported by the National Natural Science Foundation of China(62276092,62303167)the Postdoctoral Fellowship Program(Grade C)of China Postdoctoral Science Foundation(GZC20230707)+3 种基金the Key Science and Technology Program of Henan Province,China(242102211051,242102211042,212102310084)Key Scientiffc Research Projects of Colleges and Universities in Henan Province,China(25A520009)the China Postdoctoral Science Foundation(2024M760808)the Henan Province medical science and technology research plan joint construction project(LHGJ2024069).
文摘Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications.
基金supported by the National Natural Science Foundation of China(No.62376287)the International Science and Technology Innovation Joint Base of Machine Vision and Medical Image Processing in Hunan Province(2021CB1013)the Natural Science Foundation of Hunan Province(Nos.2022JJ30762,2023JJ70016).
文摘Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.
基金King Saud University,Grant/Award Number:RSP2024R157。
文摘Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework for human gait classification in video sequences using deep learning(DL)fusion assisted and posterior probability-based moth flames optimization(MFO)is proposed.In the first step,the video frames are resized and finetuned by two pre-trained lightweight DL models,EfficientNetB0 and MobileNetV2.Both models are selected based on the top-5 accuracy and less number of parameters.Later,both models are trained through deep transfer learning and extracted deep features fused using a voting scheme.In the last step,the authors develop a posterior probabilitybased MFO feature selection algorithm to select the best features.The selected features are classified using several supervised learning methods.The CASIA-B publicly available dataset has been employed for the experimental process.On this dataset,the authors selected six angles such as 0°,18°,90°,108°,162°,and 180°and obtained an average accuracy of 96.9%,95.7%,86.8%,90.0%,95.1%,and 99.7%.Results demonstrate comparable improvement in accuracy and significantly minimize the computational time with recent state-of-the-art techniques.
基金supported by the National Natural Science Foundation of China(No.U2037602)。
文摘In order to address the challenges encountered in visual navigation for asteroid landing using traditional point features,such as significant recognition and extraction errors,low computational efficiency,and limited navigation accuracy,a novel approach for multi-type fusion visual navigation is proposed.This method aims to overcome the limitations of single-type features and enhance navigation accuracy.Analytical criteria for selecting multi-type features are introduced,which simultaneously improve computational efficiency and system navigation accuracy.Concerning pose estimation,both absolute and relative pose estimation methods based on multi-type feature fusion are proposed,and multi-type feature normalization is established,which significantly improves system navigation accuracy and lays the groundwork for flexible application of joint absolute-relative estimation.The feasibility and effectiveness of the proposed method are validated through simulation experiments through 4769 Castalia.
基金Under the auspices of the Key Program of National Natural Science Foundation of China(No.42030409)。
文摘Multi-source data fusion provides high-precision spatial situational awareness essential for analyzing granular urban social activities.This study used Shanghai’s catering industry as a case study,leveraging electronic reviews and consumer data sourced from third-party restaurant platforms collected in 2021.By performing weighted processing on two-dimensional point-of-interest(POI)data,clustering hotspots of high-dimensional restaurant data were identified.A hierarchical network of restaurant hotspots was constructed following the Central Place Theory(CPT)framework,while the Geo-Informatic Tupu method was employed to resolve the challenges posed by network deformation in multi-scale processes.These findings suggest the necessity of enhancing the spatial balance of Shanghai’s urban centers by moderately increasing the number and service capacity of suburban centers at the urban periphery.Such measures would contribute to a more optimized urban structure and facilitate the outward dispersion of comfort-oriented facilities such as the restaurant industry.At a finer spatial scale,the distribution of restaurant hotspots demonstrates a polycentric and symmetric spatial pattern,with a developmental trend radiating outward along the city’s ring roads.This trend can be attributed to the efforts of restaurants to establish connections with other urban functional spaces,leading to the reconfiguration of urban spaces,expansion of restaurant-dedicated land use,and the reorganization of associated commercial activities.The results validate the existence of a polycentric urban structure in Shanghai but also highlight the instability of the restaurant hotspot network during cross-scale transitions.
基金supported by the National Natural Science Foundation of China(62302167,62477013)Natural Science Foundation of Shanghai(No.24ZR1456100)+1 种基金Science and Technology Commission of Shanghai Municipality(No.24DZ2305900)the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).
文摘Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.