The geolocation of ground targets by airborne image sensors is an important task for unmanned aerial vehicles or surveillance aircraft.This paper proposes an Iterative Geolocation based on Cross-view Image Registratio...The geolocation of ground targets by airborne image sensors is an important task for unmanned aerial vehicles or surveillance aircraft.This paper proposes an Iterative Geolocation based on Cross-view Image Registration(IGCIR)that can provide real-time target location results with high precision.The proposed method has two key features.First,a cross-view image registration process is introduced,including a projective transformation and a two-stage multi-sensor registration.This process utilizes both gradient information and phase information of cross-view images.This allows the registration process to reach a good balance between matching precision and computational efficiency.By matching the airborne camera view to the preloaded digital map,the geolocation accuracy can reach the accuracy level of the digital map for any ground target appearing in the airborne camera view.Second,the proposed method uses the registration results to perform an iteration process,which compensates for the bias of the strap-down initial navigation module online.Although it is challenging to provide cross-view registration results with high frequency,such an iteration process allows the method to generate real-time,highly accurate location results.The effectiveness of the proposed IGCIR method is verified by a series of flying-test experiments.The results show that the location accuracy of the method can reach 4.18 m(at 10 km standoff distance).展开更多
Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UA...Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UAV is determined. Its main challenge is the considerable differences in the visual content of remote sensing images acquired by satellites and UAVs, such as dramatic changes in viewpoint, unknown orientations, etc. Much of the previous work has focused on image matching of homologous data. To overcome the difficulties caused by the difference between these two data modes and maintain robustness in visual positioning, a quality-aware template matching method based on scale-adaptive deep convolutional features is proposed by deeply mining their common features. The template size feature map and the reference image feature map are first obtained. The two feature maps obtained are used to measure the similarity. Finally, a heat map representing the probability of matching is generated to determine the best match in the reference image. The method is applied to the latest UAV-based geolocation dataset(University-1652 dataset) and the real-scene campus data we collected with UAVs. The experimental results demonstrate the effectiveness and superiority of the method.展开更多
Forecasting always plays a vital role in modern economic and industrial fields,and tourism demand forecasting is an important part of intelligent tourism.This paper proposes a simple method for data modeling and a com...Forecasting always plays a vital role in modern economic and industrial fields,and tourism demand forecasting is an important part of intelligent tourism.This paper proposes a simple method for data modeling and a combined cross-view model,which is easy to implement but very effective.The method presented in this paper is commonly used for BPNN and SVR algorithms.A real tourism data set of Small Wild Goose Pagoda is used to verify the feasibility of the proposed method,with the analysis of the impact of year,season,and week on tourism demand forecasting.Comparative experiments suggest that the proposed model shows better accuracy than contrast methods.展开更多
Geolocalization is a crucial process that leverages environmental information and contextual data to accurately identify a position.In particular,cross-view geolocalization utilizes images from various perspectives,su...Geolocalization is a crucial process that leverages environmental information and contextual data to accurately identify a position.In particular,cross-view geolocalization utilizes images from various perspectives,such as satellite and ground-level images,which are relevant for applications like robotics navigation and autonomous navigation.In this research,we propose a methodology that integrates cross-view geolocalization estimation with a land cover semantic segmentation map.Our solution demonstrates comparable performance to state-of-the-art methods,exhibiting enhanced stability and consistency regardless of the street view location or the dataset used.Additionally,our method generates a focused discrete probability distribution that acts as a heatmap.This heatmap effectively filters out incorrect and unlikely regions,enhancing the reliability of our estimations.Code is available at https://github.com/nathanxavier/CVSegGuide.展开更多
We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 ...We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2 D space into a clustering problem in a 3 D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2 D joints for the same person across different views, from which the 3 D joints can be effectively inferred. We then assemble the inferred 3 D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion,closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.展开更多
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulner...Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.展开更多
Accurate localization is critical for lunar rovers exploring lunar terrain features.Traditionally,lunar rover localization relies on sensor data from odometers,inertial measurement units and stereo cameras.However,loc...Accurate localization is critical for lunar rovers exploring lunar terrain features.Traditionally,lunar rover localization relies on sensor data from odometers,inertial measurement units and stereo cameras.However,localization errors accumulate over long traverses,limiting the rover’s localization accuracy.This paper presents a metric localization framework based on cross-view images(ground view from a rover and air view from an orbiter)to eliminate accumulated localization errors.First,we employ perspective projection to reduce the geometric differences in cross-view images.Then,we propose an image-based metric localization network to extract image features and generate a location heatmap.This heatmap serves as the basis for accurate estimation of query locations.We also create the first large-area lunar cross-view image(Lunar-CV)dataset to evaluate the localization performance.This dataset consists of 30 digital orthophoto maps(DOMs)with a resolution of 7 m/pix,collected by the Chang’e-2 lunar orbiter,along with 8100 simulated rover panoramas.Experimental results on the Lunar-CV dataset demonstrate the superior performance of our proposed framework.Compared to the second best method,our method significantly reduces the average localization error by 26% and the median localization error by 22%.展开更多
【目的】跨视角对象级地理定位(CVOGL)旨在卫星影像上精确定位地面街景或无人机影像所观测目标的地理位置。现有方法多聚焦于图像级匹配,通过对整张影像全局处理实现跨视角关联,缺乏对特定目标的位置编码研究,导致无法将模型的注意力引...【目的】跨视角对象级地理定位(CVOGL)旨在卫星影像上精确定位地面街景或无人机影像所观测目标的地理位置。现有方法多聚焦于图像级匹配,通过对整张影像全局处理实现跨视角关联,缺乏对特定目标的位置编码研究,导致无法将模型的注意力引导到感兴趣目标。并且由于参考图像覆盖范围的变化,查询目标在对应卫星图像中的像素占比极低,精确定位较为困难。【方法】针对以上问题,本文提出了一种基于高斯核函数与异构空间对比损失的跨视角对象级地理定位方法(Cross-View Object-Level Geo-Localization Method with Gaussian Kernel Function and Heterogeneous Spatial Contrastive Loss,GHGeo),用于精确定位感兴趣目标位置。该方法首先通过高斯核函数对查询目标进行精确位置编码,实现了对目标中心点及其分布特征的精细化建模;此外还提出了动态注意力精细化融合模块来动态加权交叉感知全局上下文与局部几何特征的空间相似性,以概率密度预测查询目标在卫星影像中的精确位置;最后通过异构空间对比损失函数来约束其训练过程,缓解跨视角特征差异。【结果】本文在CVOGL数据集进行了实验,实验结果显示:GHGeo在该数据集的“无人机-卫星”任务中,当交并比(IoU)≥25%和≥50%时定位准确率分别达到67.73%和63.00%,相较于基准方法DetGeo分别提升了5.76%和5.34%;在“街景-卫星”定位任务中,对应IoU阈值下的定位准确率分别为48.41%和45.43%的定位准确率,相较于基准方法DetGeo分别提升了2.98%和3.19%。同时与TransGeo,SAFA和VAGeo等方法在CVOGL数据集上进行对比,GHGeo则展现出了更高的定位准确性。【结论】本文方法有效提升了跨视角对象级地理定位方法的精度,为城市规划监测,应急救援调度等应用领域提供关键技术支持和精确位置信息支撑。展开更多
基金supported by the National Level Project of China(No.52-L0D01-0613-20/22)。
文摘The geolocation of ground targets by airborne image sensors is an important task for unmanned aerial vehicles or surveillance aircraft.This paper proposes an Iterative Geolocation based on Cross-view Image Registration(IGCIR)that can provide real-time target location results with high precision.The proposed method has two key features.First,a cross-view image registration process is introduced,including a projective transformation and a two-stage multi-sensor registration.This process utilizes both gradient information and phase information of cross-view images.This allows the registration process to reach a good balance between matching precision and computational efficiency.By matching the airborne camera view to the preloaded digital map,the geolocation accuracy can reach the accuracy level of the digital map for any ground target appearing in the airborne camera view.Second,the proposed method uses the registration results to perform an iteration process,which compensates for the bias of the strap-down initial navigation module online.Although it is challenging to provide cross-view registration results with high frequency,such an iteration process allows the method to generate real-time,highly accurate location results.The effectiveness of the proposed IGCIR method is verified by a series of flying-test experiments.The results show that the location accuracy of the method can reach 4.18 m(at 10 km standoff distance).
基金co-supported by the National Natural Science Foundations of China(Nos.62175111 and 62001234)。
文摘Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UAV is determined. Its main challenge is the considerable differences in the visual content of remote sensing images acquired by satellites and UAVs, such as dramatic changes in viewpoint, unknown orientations, etc. Much of the previous work has focused on image matching of homologous data. To overcome the difficulties caused by the difference between these two data modes and maintain robustness in visual positioning, a quality-aware template matching method based on scale-adaptive deep convolutional features is proposed by deeply mining their common features. The template size feature map and the reference image feature map are first obtained. The two feature maps obtained are used to measure the similarity. Finally, a heat map representing the probability of matching is generated to determine the best match in the reference image. The method is applied to the latest UAV-based geolocation dataset(University-1652 dataset) and the real-scene campus data we collected with UAVs. The experimental results demonstrate the effectiveness and superiority of the method.
文摘Forecasting always plays a vital role in modern economic and industrial fields,and tourism demand forecasting is an important part of intelligent tourism.This paper proposes a simple method for data modeling and a combined cross-view model,which is easy to implement but very effective.The method presented in this paper is commonly used for BPNN and SVR algorithms.A real tourism data set of Small Wild Goose Pagoda is used to verify the feasibility of the proposed method,with the analysis of the impact of year,season,and week on tourism demand forecasting.Comparative experiments suggest that the proposed model shows better accuracy than contrast methods.
基金financed in part by the Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior-Brasil(CAPES)(88887.929508/2023-00 and 88887.937224/2024-00)partially funded by the National Research Council of Brazil(CNPq)(307525/2022-8).
文摘Geolocalization is a crucial process that leverages environmental information and contextual data to accurately identify a position.In particular,cross-view geolocalization utilizes images from various perspectives,such as satellite and ground-level images,which are relevant for applications like robotics navigation and autonomous navigation.In this research,we propose a methodology that integrates cross-view geolocalization estimation with a land cover semantic segmentation map.Our solution demonstrates comparable performance to state-of-the-art methods,exhibiting enhanced stability and consistency regardless of the street view location or the dataset used.Additionally,our method generates a focused discrete probability distribution that acts as a heatmap.This heatmap effectively filters out incorrect and unlikely regions,enhancing the reliability of our estimations.Code is available at https://github.com/nathanxavier/CVSegGuide.
基金partially supported by National Natural Science Foundation of China(No.61872317)Face Unity Technology。
文摘We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2 D space into a clustering problem in a 3 D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2 D joints for the same person across different views, from which the 3 D joints can be effectively inferred. We then assemble the inferred 3 D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion,closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.
文摘Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.
基金partially supported by the Shenzhen Key Laboratory of Navigation and Communication Integration(No.ZDSYS20210623091807023).
文摘Accurate localization is critical for lunar rovers exploring lunar terrain features.Traditionally,lunar rover localization relies on sensor data from odometers,inertial measurement units and stereo cameras.However,localization errors accumulate over long traverses,limiting the rover’s localization accuracy.This paper presents a metric localization framework based on cross-view images(ground view from a rover and air view from an orbiter)to eliminate accumulated localization errors.First,we employ perspective projection to reduce the geometric differences in cross-view images.Then,we propose an image-based metric localization network to extract image features and generate a location heatmap.This heatmap serves as the basis for accurate estimation of query locations.We also create the first large-area lunar cross-view image(Lunar-CV)dataset to evaluate the localization performance.This dataset consists of 30 digital orthophoto maps(DOMs)with a resolution of 7 m/pix,collected by the Chang’e-2 lunar orbiter,along with 8100 simulated rover panoramas.Experimental results on the Lunar-CV dataset demonstrate the superior performance of our proposed framework.Compared to the second best method,our method significantly reduces the average localization error by 26% and the median localization error by 22%.
文摘【目的】跨视角对象级地理定位(CVOGL)旨在卫星影像上精确定位地面街景或无人机影像所观测目标的地理位置。现有方法多聚焦于图像级匹配,通过对整张影像全局处理实现跨视角关联,缺乏对特定目标的位置编码研究,导致无法将模型的注意力引导到感兴趣目标。并且由于参考图像覆盖范围的变化,查询目标在对应卫星图像中的像素占比极低,精确定位较为困难。【方法】针对以上问题,本文提出了一种基于高斯核函数与异构空间对比损失的跨视角对象级地理定位方法(Cross-View Object-Level Geo-Localization Method with Gaussian Kernel Function and Heterogeneous Spatial Contrastive Loss,GHGeo),用于精确定位感兴趣目标位置。该方法首先通过高斯核函数对查询目标进行精确位置编码,实现了对目标中心点及其分布特征的精细化建模;此外还提出了动态注意力精细化融合模块来动态加权交叉感知全局上下文与局部几何特征的空间相似性,以概率密度预测查询目标在卫星影像中的精确位置;最后通过异构空间对比损失函数来约束其训练过程,缓解跨视角特征差异。【结果】本文在CVOGL数据集进行了实验,实验结果显示:GHGeo在该数据集的“无人机-卫星”任务中,当交并比(IoU)≥25%和≥50%时定位准确率分别达到67.73%和63.00%,相较于基准方法DetGeo分别提升了5.76%和5.34%;在“街景-卫星”定位任务中,对应IoU阈值下的定位准确率分别为48.41%和45.43%的定位准确率,相较于基准方法DetGeo分别提升了2.98%和3.19%。同时与TransGeo,SAFA和VAGeo等方法在CVOGL数据集上进行对比,GHGeo则展现出了更高的定位准确性。【结论】本文方法有效提升了跨视角对象级地理定位方法的精度,为城市规划监测,应急救援调度等应用领域提供关键技术支持和精确位置信息支撑。