In the field of video scene graph generation,spatio-temporal feature extraction and the long-tail effect in relationship classification are core research issues.This paper proposes extracting spatio-temporal features ...In the field of video scene graph generation,spatio-temporal feature extraction and the long-tail effect in relationship classification are core research issues.This paper proposes extracting spatio-temporal features using the global-local Transformer model for video scene graph generation.Methods based on the Transformer architecture and attention mechanism enrich the semantic information of spatio-temporal features in videos,thereby improving the accuracy of relationship classification.In the feature processing module,pose features are introduced to strengthen the semantic representation of objects.In the spatial feature encoding module,a local spatial visibility matrix based on bounding boxes and key points of human pose features is proposed to add the issue of insufficient attention to local details in traditional Transformer encoders.In the temporal feature encoding module,a global random frame extraction strategy is proposed,which considers global temporal features while also taking computational complexity into account.In the relation classification module,to address the uneven distribution of object and relation categories in the Action Genome dataset,a relation classification loss function based on bipartite graph matching and Focal Loss is proposed,which alleviates the long-tail effect in relation classification and improves the accuracy.展开更多
Target tracking is one typical application of visual servoing technology. It is still a difficult task to track high speed target with current visual servo system. The improvement of visual servoing scheme is strongly...Target tracking is one typical application of visual servoing technology. It is still a difficult task to track high speed target with current visual servo system. The improvement of visual servoing scheme is strongly required. A position-based visual servo parallel system is presented for tracking target with high speed. A local Frenet frame is assigned to the sampling point of spatial trajectory. Position estimation is formed by the differential features of intrinsic geometry, and orientation estimation is formed by homogenous transformation. The time spent for searching and processing can be greatly reduced by shifting the window according to features location prediction. The simulation results have demonstrated the ability of the system to track spatial moving object.展开更多
基金supported by National Natural Science Foundation of China(Grant No.62071098)Sichuan Science and Technology Program(Grants 2022YFG0319,2023YFG0301 and 2023YFG0018)。
文摘In the field of video scene graph generation,spatio-temporal feature extraction and the long-tail effect in relationship classification are core research issues.This paper proposes extracting spatio-temporal features using the global-local Transformer model for video scene graph generation.Methods based on the Transformer architecture and attention mechanism enrich the semantic information of spatio-temporal features in videos,thereby improving the accuracy of relationship classification.In the feature processing module,pose features are introduced to strengthen the semantic representation of objects.In the spatial feature encoding module,a local spatial visibility matrix based on bounding boxes and key points of human pose features is proposed to add the issue of insufficient attention to local details in traditional Transformer encoders.In the temporal feature encoding module,a global random frame extraction strategy is proposed,which considers global temporal features while also taking computational complexity into account.In the relation classification module,to address the uneven distribution of object and relation categories in the Action Genome dataset,a relation classification loss function based on bipartite graph matching and Focal Loss is proposed,which alleviates the long-tail effect in relation classification and improves the accuracy.
基金This project is supported by National Electric Power Corporation Foundation of China(No.SPKJ010-27).
文摘Target tracking is one typical application of visual servoing technology. It is still a difficult task to track high speed target with current visual servo system. The improvement of visual servoing scheme is strongly required. A position-based visual servo parallel system is presented for tracking target with high speed. A local Frenet frame is assigned to the sampling point of spatial trajectory. Position estimation is formed by the differential features of intrinsic geometry, and orientation estimation is formed by homogenous transformation. The time spent for searching and processing can be greatly reduced by shifting the window according to features location prediction. The simulation results have demonstrated the ability of the system to track spatial moving object.