On 5 March 2021,the 4th Session of the13th National People’s Congress was held in the Great Hall of the People,Beijing.Premier Li Keqiang,on behalf the State Council,delivered the government report to the session,and...On 5 March 2021,the 4th Session of the13th National People’s Congress was held in the Great Hall of the People,Beijing.Premier Li Keqiang,on behalf the State Council,delivered the government report to the session,and following is the key-points.展开更多
To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, ...To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, 3 rotation-invariant concentric-ring grids around the key-point location are used instead of 16 square grids used in the original SIFT. Then, 10 orientations are accumulated for each grid, which results in a 30-dimension descriptor. In descriptor matching, rough rejection mismatches is proposed based on the difference of grey information between matching points. The per- formance of the proposed method is tested for image mosaic on simulated and real-worid images. Experimental results show that the M-SIFT descriptor inherits the SIFT' s ability of being invariant to image scale and rotation, illumination change and affine distortion. Besides the time cost of feature extraction is reduced by 50% compared with the original SIFT. And the rough rejection mismatches can reject at least 70% of mismatches. The results also demonstrate that the performance of the pro- posed M-SIFT method is superior to other improved SIFT methods in speed and robustness.展开更多
Most visual-language navigation(VLN)research focuses on simulate environments,but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environme...Most visual-language navigation(VLN)research focuses on simulate environments,but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environments,leading to path deviations.To address this,we propose a novel vision-and-language object navigation strategy that uses multimodal pretrained knowledge as a cross-modal bridge to link semantic concepts in both images and text.This improves navigation supervision at key-points and enhances robustness.Specifically,we 1)randomly generate key-points within a specific density range and optimize them on the basis of challenging locations;2)use pretrained multimodal knowledge to efficiently retrieve target objects;3)combine depth information with simultaneous localization and mapping(SLAM)map data to predict optimal positions and orientations for accurate navigation;and 4)implement the method on a physical robot,successfully conducting navigation tests.Our approach achieves a maximum success rate of 66.7%,outperforming existing VLN methods in real-world environments.展开更多
文摘On 5 March 2021,the 4th Session of the13th National People’s Congress was held in the Great Hall of the People,Beijing.Premier Li Keqiang,on behalf the State Council,delivered the government report to the session,and following is the key-points.
基金Supported by the National Natural Science Foundation of China(60905012)
文摘To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, 3 rotation-invariant concentric-ring grids around the key-point location are used instead of 16 square grids used in the original SIFT. Then, 10 orientations are accumulated for each grid, which results in a 30-dimension descriptor. In descriptor matching, rough rejection mismatches is proposed based on the difference of grey information between matching points. The per- formance of the proposed method is tested for image mosaic on simulated and real-worid images. Experimental results show that the M-SIFT descriptor inherits the SIFT' s ability of being invariant to image scale and rotation, illumination change and affine distortion. Besides the time cost of feature extraction is reduced by 50% compared with the original SIFT. And the rough rejection mismatches can reject at least 70% of mismatches. The results also demonstrate that the performance of the pro- posed M-SIFT method is superior to other improved SIFT methods in speed and robustness.
基金jointly supported by the National Natural Science Foundation of China(Nos.62236010,62322607,62276261 and 62076014)the Youth Innovation Promotion Association of Chinese Academy of Sciences,China(No.2021128)+1 种基金the Joint Fund of Natural Science of Hunan Province,China(No.2023JJ50242)the Key Projects of Education Department of Hunan Province,China(No.22A0115).
文摘Most visual-language navigation(VLN)research focuses on simulate environments,but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environments,leading to path deviations.To address this,we propose a novel vision-and-language object navigation strategy that uses multimodal pretrained knowledge as a cross-modal bridge to link semantic concepts in both images and text.This improves navigation supervision at key-points and enhances robustness.Specifically,we 1)randomly generate key-points within a specific density range and optimize them on the basis of challenging locations;2)use pretrained multimodal knowledge to efficiently retrieve target objects;3)combine depth information with simultaneous localization and mapping(SLAM)map data to predict optimal positions and orientations for accurate navigation;and 4)implement the method on a physical robot,successfully conducting navigation tests.Our approach achieves a maximum success rate of 66.7%,outperforming existing VLN methods in real-world environments.