Remote sensing cross-modal image-text retrieval(RSCIR)can flexibly and subjectively retrieve remote sensing images utilizing query text,which has received more researchers’attention recently.However,with the increasi...Remote sensing cross-modal image-text retrieval(RSCIR)can flexibly and subjectively retrieve remote sensing images utilizing query text,which has received more researchers’attention recently.However,with the increasing volume of visual-language pre-training model parameters,direct transfer learning consumes a substantial amount of computational and storage resources.Moreover,recently proposed parameter-efficient transfer learning methods mainly focus on the reconstruction of channel features,ignoring the spatial features which are vital for modeling key entity relationships.To address these issues,we design an efficient transfer learning framework for RSCIR,which is based on spatial feature efficient reconstruction(SPER).A concise and efficient spatial adapter is introduced to enhance the extraction of spatial relationships.The spatial adapter is able to spatially reconstruct the features in the backbone with few parameters while incorporating the prior information from the channel dimension.We conduct quantitative and qualitative experiments on two different commonly used RSCIR datasets.Compared with traditional methods,our approach achieves an improvement of 3%-11% in sumR metric.Compared with methods finetuning all parameters,our proposed method only trains less than 1% of the parameters,while maintaining an overall performance of about 96%.展开更多
The subsurface of urban cities is becoming increasingly congested.In-time records of subsur-face structures are of vital importance for the maintenance and management of urban infrastructure beneath or above the groun...The subsurface of urban cities is becoming increasingly congested.In-time records of subsur-face structures are of vital importance for the maintenance and management of urban infrastructure beneath or above the ground.Ground-penetrating radar(GPR)is a nondestructive testing method that can survey and image the subsurface without excava-tion.However,the interpretation of GPR relies on the operator’s experience.An automatic workflow was proposed for recognizing and classifying subsurface structures with GPR using computer vision and machine learning techniques.The workflow comprises three stages:first,full-cover GPR measurements are processed to form the C-scans;second,the abnormal areas are extracted from the full-cover C-scans with coefficient of variation-active contour model(CV-ACM);finally,the extracted segments are recognized and classified from the corresponding B-scans with aggregate channel feature(ACF)to produce a semantic map.The selected computer vision methods were validated by a controlled test in the laboratory,and the entire workflow was evaluated with a real,on-site case study.The results of the controlled and on-site case were both promising.This study establishes the necessity of a full-cover 3D GPR survey,illustrating the feasibility of integrating advanced computer vision techniques to analyze a large amount of 3D GPR survey data,and paves the way for automating subsurface modeling with GPR.展开更多
Skin melanoma is one of the most common malignant tumorsoriginating from melanocytes, and the incidence of the Chinese populationis showing a continuous increasing trend. Early and accurate diagnosisof melanoma has gr...Skin melanoma is one of the most common malignant tumorsoriginating from melanocytes, and the incidence of the Chinese populationis showing a continuous increasing trend. Early and accurate diagnosisof melanoma has great significance for guiding clinical treatment.However, the symptoms of malignant melanoma are not obvious in theearly stage. It is difficult to be diagnosed with human observation. Meanwhile,it is easy to spread due to missed diagnosis. In order to accuratelydiagnose melanoma, end-to-end skin lesion attribute segmentation frameworkis presented in this paper. It is applied to facilitate the digitalizationprocess of attributes segmentation. The framework was improved on theU-Net construction that use the channel context feature fusion modulebetween the encoder and decoder to further merge context information. Adual-domain attention module is proposed to get more effective informationfrom the feature map. It shows that the proposed method effectivelysegments the lesion attributes and achieves good result in the ISIC2018task2 dataset.展开更多
基金supported by the National Key R&D Program of China(No.2022ZD0118402)。
文摘Remote sensing cross-modal image-text retrieval(RSCIR)can flexibly and subjectively retrieve remote sensing images utilizing query text,which has received more researchers’attention recently.However,with the increasing volume of visual-language pre-training model parameters,direct transfer learning consumes a substantial amount of computational and storage resources.Moreover,recently proposed parameter-efficient transfer learning methods mainly focus on the reconstruction of channel features,ignoring the spatial features which are vital for modeling key entity relationships.To address these issues,we design an efficient transfer learning framework for RSCIR,which is based on spatial feature efficient reconstruction(SPER).A concise and efficient spatial adapter is introduced to enhance the extraction of spatial relationships.The spatial adapter is able to spatially reconstruct the features in the backbone with few parameters while incorporating the prior information from the channel dimension.We conduct quantitative and qualitative experiments on two different commonly used RSCIR datasets.Compared with traditional methods,our approach achieves an improvement of 3%-11% in sumR metric.Compared with methods finetuning all parameters,our proposed method only trains less than 1% of the parameters,while maintaining an overall performance of about 96%.
基金supported by the Shenzhen University[860-000002111308].
文摘The subsurface of urban cities is becoming increasingly congested.In-time records of subsur-face structures are of vital importance for the maintenance and management of urban infrastructure beneath or above the ground.Ground-penetrating radar(GPR)is a nondestructive testing method that can survey and image the subsurface without excava-tion.However,the interpretation of GPR relies on the operator’s experience.An automatic workflow was proposed for recognizing and classifying subsurface structures with GPR using computer vision and machine learning techniques.The workflow comprises three stages:first,full-cover GPR measurements are processed to form the C-scans;second,the abnormal areas are extracted from the full-cover C-scans with coefficient of variation-active contour model(CV-ACM);finally,the extracted segments are recognized and classified from the corresponding B-scans with aggregate channel feature(ACF)to produce a semantic map.The selected computer vision methods were validated by a controlled test in the laboratory,and the entire workflow was evaluated with a real,on-site case study.The results of the controlled and on-site case were both promising.This study establishes the necessity of a full-cover 3D GPR survey,illustrating the feasibility of integrating advanced computer vision techniques to analyze a large amount of 3D GPR survey data,and paves the way for automating subsurface modeling with GPR.
基金The paper is supported by the National Natural Science Foundation of China under Grant No.62072135 and No.61672181.
文摘Skin melanoma is one of the most common malignant tumorsoriginating from melanocytes, and the incidence of the Chinese populationis showing a continuous increasing trend. Early and accurate diagnosisof melanoma has great significance for guiding clinical treatment.However, the symptoms of malignant melanoma are not obvious in theearly stage. It is difficult to be diagnosed with human observation. Meanwhile,it is easy to spread due to missed diagnosis. In order to accuratelydiagnose melanoma, end-to-end skin lesion attribute segmentation frameworkis presented in this paper. It is applied to facilitate the digitalizationprocess of attributes segmentation. The framework was improved on theU-Net construction that use the channel context feature fusion modulebetween the encoder and decoder to further merge context information. Adual-domain attention module is proposed to get more effective informationfrom the feature map. It shows that the proposed method effectivelysegments the lesion attributes and achieves good result in the ISIC2018task2 dataset.