The 6D pose estimation is important for the safe take-off and landing of the aircraft using a single RGB image. Due to the large scene and large depth, the exiting pose estimation methods have unstratified performance...The 6D pose estimation is important for the safe take-off and landing of the aircraft using a single RGB image. Due to the large scene and large depth, the exiting pose estimation methods have unstratified performance on the accuracy. To achieve precise 6D pose estimation of the aircraft, an end-to-end method using an RGB image is proposed. In the proposed method, the2D and 3D information of the keypoints of the aircraft is used as the intermediate supervision,and 6D pose information of the aircraft in this intermediate information will be explored. Specifically, an off-the-shelf object detector is utilized to detect the Region of the Interest(Ro I) of the aircraft to eliminate background distractions. The 2D projection and 3D spatial information of the pre-designed keypoints of the aircraft is predicted by the keypoint coordinate estimator(Kp Net).The proposed method is trained in an end-to-end fashion. In addition, to deal with the lack of the related datasets, this paper builds the Aircraft 6D Pose dataset to train and test, which captures the take-off and landing process of three types of aircraft from 11 views. Compared with the latest Wide-Depth-Range method on this dataset, our proposed method improves the average 3D distance of model points metric(ADD) and 5° and 5 m metric by 86.8% and 30.1%, respectively. Furthermore, the proposed method gets 9.30 ms, 61.0% faster than YOLO6D with 23.86 ms.展开更多
We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study.First,we introduce a two-stream architecture consisting of segmentation and regression strea...We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study.First,we introduce a two-stream architecture consisting of segmentation and regression streams.The segmentation stream processes the spatial embedding features and obtains the corresponding image crop.These features are further coupled with the image crop in the fusion network.Second,we use an efficient perspective-n-point(E-PnP)algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints.Finally,we perform iterative refinement with an end-to-end mechanism to improve the estimation performance.We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD.The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.展开更多
In unstructured environments such as disaster sites and mine tunnels,it is a challenge for robots to estimate the poses of objects under complex lighting backgrounds,which limit their operation.Owing to the shadows pr...In unstructured environments such as disaster sites and mine tunnels,it is a challenge for robots to estimate the poses of objects under complex lighting backgrounds,which limit their operation.Owing to the shadows produced by a point light source,the brightness of the operation scene is seriously unbalanced,and it is difficult to accurately extract the features of objects.It is especially difficult to accurately label the poses of objects with weak corners and textures.This study proposes an automatic pose annotation method for such objects,which combine 3D-2D matching projection and rendering technology to improve the efficiency of dataset annotation.A 6D object pose estimation method under low-light conditions(LP_TGC)is then proposed,including(1)a light preprocessing neural network model based on a low-light preprocessing module(LPM)to balance the brightness of a picture and improve its quality;and(2)a 6D pose estimation model(TGC)based on the keypoint matching.Four typical datasets are constructed to verify our method,the experimental results validated and demonstrated the effectiveness of the proposed LP_TGC method.The estimation model based on the preprocessed image can accurately estimate the pose of the object in the mentioned unstructured environments,and it can improve the accuracy by an average of~3%based on the ADD metric.展开更多
基金co-supported by the Key research and development plan project of Sichuan Province,China(No.2022YFG0153).
文摘The 6D pose estimation is important for the safe take-off and landing of the aircraft using a single RGB image. Due to the large scene and large depth, the exiting pose estimation methods have unstratified performance on the accuracy. To achieve precise 6D pose estimation of the aircraft, an end-to-end method using an RGB image is proposed. In the proposed method, the2D and 3D information of the keypoints of the aircraft is used as the intermediate supervision,and 6D pose information of the aircraft in this intermediate information will be explored. Specifically, an off-the-shelf object detector is utilized to detect the Region of the Interest(Ro I) of the aircraft to eliminate background distractions. The 2D projection and 3D spatial information of the pre-designed keypoints of the aircraft is predicted by the keypoint coordinate estimator(Kp Net).The proposed method is trained in an end-to-end fashion. In addition, to deal with the lack of the related datasets, this paper builds the Aircraft 6D Pose dataset to train and test, which captures the take-off and landing process of three types of aircraft from 11 views. Compared with the latest Wide-Depth-Range method on this dataset, our proposed method improves the average 3D distance of model points metric(ADD) and 5° and 5 m metric by 86.8% and 30.1%, respectively. Furthermore, the proposed method gets 9.30 ms, 61.0% faster than YOLO6D with 23.86 ms.
基金the National Key Research and Development Program of China under Grant No.2021YFB1715900the National Natural Science Foundation of China under Grant Nos.12022117 and 61802406+2 种基金the Beijing Natural Science Foundation under Grant No.Z190004the Beijing Advanced Discipline Fund under Grant No.115200S001Alibaba Group through Alibaba Innovative Research Program.
文摘We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study.First,we introduce a two-stream architecture consisting of segmentation and regression streams.The segmentation stream processes the spatial embedding features and obtains the corresponding image crop.These features are further coupled with the image crop in the fusion network.Second,we use an efficient perspective-n-point(E-PnP)algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints.Finally,we perform iterative refinement with an end-to-end mechanism to improve the estimation performance.We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD.The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.
基金supported by the National Key Research and Development Program of China(Grant No.2018YFB1305300)the China Postdoctoral Science Foundation(Grant Nos.2020TQ0039 and 2021M700425)the National Natural Science Foundation of China(Grant Nos.61733001,62103054,U2013602,61873039,U1913211 and U1713215)。
文摘In unstructured environments such as disaster sites and mine tunnels,it is a challenge for robots to estimate the poses of objects under complex lighting backgrounds,which limit their operation.Owing to the shadows produced by a point light source,the brightness of the operation scene is seriously unbalanced,and it is difficult to accurately extract the features of objects.It is especially difficult to accurately label the poses of objects with weak corners and textures.This study proposes an automatic pose annotation method for such objects,which combine 3D-2D matching projection and rendering technology to improve the efficiency of dataset annotation.A 6D object pose estimation method under low-light conditions(LP_TGC)is then proposed,including(1)a light preprocessing neural network model based on a low-light preprocessing module(LPM)to balance the brightness of a picture and improve its quality;and(2)a 6D pose estimation model(TGC)based on the keypoint matching.Four typical datasets are constructed to verify our method,the experimental results validated and demonstrated the effectiveness of the proposed LP_TGC method.The estimation model based on the preprocessed image can accurately estimate the pose of the object in the mentioned unstructured environments,and it can improve the accuracy by an average of~3%based on the ADD metric.