贵刊1995年第12期 p.30《怎样理解这三个句子》一文,读后颇受启发。但该文有一处注释写道:“suppose,supposing 引导条件状语从句时,仅用于问句。”笔者认为,这一说法欠妥。请看以下例证:Suppose white were black,you might be right....贵刊1995年第12期 p.30《怎样理解这三个句子》一文,读后颇受启发。但该文有一处注释写道:“suppose,supposing 引导条件状语从句时,仅用于问句。”笔者认为,这一说法欠妥。请看以下例证:Suppose white were black,you might be right.假如白的即是黑的,那末你或许就对了。(《英汉大词典》下卷 p.3490)Suppose(Supposing)you miss your tiger,he is not likely to miss you.你如果打不着老虎,老虎不见得吃不着你。(《英华大词典》修订第二版 p.1399)展开更多
Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton s...Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.展开更多
The goal of the present study is to investigate the relationship between pupils’problem posing and problem solving abilities,their beliefs about problem posing and problem solving,and their general mathematics abilit...The goal of the present study is to investigate the relationship between pupils’problem posing and problem solving abilities,their beliefs about problem posing and problem solving,and their general mathematics abilities,in a Chinese context.Five instruments,i.e.,a problem posing test,a problem solving test,a problem posing questionnaire,a problem solving questionnaire,and a standard achievement test,were administered to 69 Chinese fifth-grade pupils to assess these five variables and analyze their mutual relationships.Results revealed strong correlations between pupils’problem posing and problem solving abilities and beliefs,and their general mathematical abilities.展开更多
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su...Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.展开更多
Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onbo...Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.展开更多
This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation ...This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation accuracy and robustness.We employ a manifold-based optimization approach,leveraging exponential and logarithmic mappings to transform rotation vectors into rotation matrices.The proposed ESKF framework ensures state variables remain near the origin,effectively mitigating singularity issues and enhancing numerical stability.Additionally,due to the small magnitude of state variables,second-order terms can be neglected,simplifying Jacobian matrix computation and improving computational efficiency.Furthermore,we introduce a novel Kalman filter gain computation strategy that dynamically adapts to low-dimensional and high-dimensional observation equations,enabling efficient processing across different sensor modalities.Specifically,for resource-constrained UAV platforms,this method significantly reduces computational cost,making it highly suitable for real-time UAV applications.展开更多
Graph convolutional network(GCN)as an essential tool in human action recognition tasks have achieved excellent performance in previous studies.However,most current skeleton-based action recognition using GCN methods u...Graph convolutional network(GCN)as an essential tool in human action recognition tasks have achieved excellent performance in previous studies.However,most current skeleton-based action recognition using GCN methods use a shared topology,which cannot flexibly adapt to the diverse correlations between joints under different motion features.The video-shooting angle or the occlusion of the body parts may bring about errors when extracting the human pose coordinates with estimation algorithms.In this work,we propose a novel graph convolutional learning framework,called PCCTR-GCN,which integrates pose correction and channel topology refinement for skeleton-based human action recognition.Firstly,a pose correction module(PCM)is introduced,which corrects the pose coordinates of the input network to reduce the error in pose feature extraction.Secondly,channel topology refinement graph convolution(CTR-GC)is employed,which can dynamically learn the topology features and aggregate joint features in different channel dimensions so as to enhance the performance of graph convolution networks in feature extraction.Finally,considering that the joint stream and bone stream of skeleton data and their dynamic information are also important for distinguishing different actions,we employ a multi-stream data fusion approach to improve the network’s recognition performance.We evaluate the model using top-1 and top-5 classification accuracy.On the benchmark datasets iMiGUE and Kinetics,the top-1 classification accuracy reaches 55.08%and 36.5%,respectively,while the top-5 classification accuracy reaches 89.98%and 59.2%,respectively.On the NTU dataset,for the two benchmark RGB+Dsettings(X-Sub and X-View),the classification accuracy achieves 89.7%and 95.4%,respectively.展开更多
[Objective]Fish pose estimation(FPE)provides fish physiological information,facilitating health monitoring in aquaculture.It aids decision-making in areas such as fish behavior recognition.When fish are injured or def...[Objective]Fish pose estimation(FPE)provides fish physiological information,facilitating health monitoring in aquaculture.It aids decision-making in areas such as fish behavior recognition.When fish are injured or deficient,they often display abnormal behaviors and noticeable changes in the positioning of their body parts.Moreover,the unpredictable posture and orientation of fish during swimming,combined with the rapid swimming speed of fish,restrict the current scope of research in FPE.In this research,a FPE model named HPFPE is presented to capture the swimming posture of fish and accurately detect their key points.[Methods]On the one hand,this model incorporated the CBAM module into the HRNet framework.The attention module enhanced accuracy without adding computational complexity,while effectively capturing a broader range of contextual information.On the other hand,the model incorporated dilated convolution to increase the receptive field,allowing it to capture more spatial context.[Results and Discussions]Experiments showed that compared with the baseline method,the average precision(AP)of HPFPE based on different backbones and input sizes on the oplegnathus punctatus datasets had increased by 0.62,1.35,1.76,and 1.28 percent point,respectively,while the average recall(AR)had also increased by 0.85,1.50,1.40,and 1.00,respectively.Additionally,HPFPE outperformed other mainstream methods,including DeepPose,CPM,SCNet,and Lite-HRNet.Furthermore,when compared to other methods using the ornamental fish data,HPFPE achieved the highest AP and AR values of 52.96%,and 59.50%,respectively.[Conclusions]The proposed HPFPE can accurately estimate fish posture and assess their swimming patterns,serving as a valuable reference for applications such as fish behavior recognition.展开更多
Pose estimation of spacecraft targets is a key technology for achieving space operation tasks,such as the cleaning of failed satellites and the detection and scanning of non-cooperative targets.This paper reviews the ...Pose estimation of spacecraft targets is a key technology for achieving space operation tasks,such as the cleaning of failed satellites and the detection and scanning of non-cooperative targets.This paper reviews the target pose estimation methods based on image feature extraction and PnP,the target estimation methods based on registration,and the spacecraft target pose estimation methods based on deep learning,and introduces the corresponding research methods.展开更多
Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness...Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.展开更多
Background:Q uantifying the rich home-c age activities of tree shrews provides a reliable basis for understanding their daily routines and building disease models.However,due to the lack of effective behavioral method...Background:Q uantifying the rich home-c age activities of tree shrews provides a reliable basis for understanding their daily routines and building disease models.However,due to the lack of effective behavioral methods,most efforts on tree shrew behavior are limited to simple measures,resulting in the loss of much behavioral information.Methods:T o address this issue,we present a deep learning(DL)approach to achieve markerless pose estimation and recognize multiple spontaneous behaviors of tree shrews,including drinking,eating,resting,and staying in the dark house,etc.Results:T his high-t hroughput approach can monitor the home-cage activities of 16 tree shrews simultaneously over an extended period.Additionally,we demonstrated an innovative system with reliable apparatus,paradigms,and analysis methods for investigating food grasping behavior.The median duration for each bout of grasping was 0.20 s.Conclusion:T his study provides an efficient tool for quantifying and understand tree shrews'natural behaviors.展开更多
The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative po...The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative pose estimation.This study proposes a novel airborne monocular visual estimation method based on structured scene features to address this challenge.First,a multitask neural network model is established for segmentation,depth estimation,and slope estimation on monocular images.And a monocular image comprehensive three-dimensional information metric is designed,encompassing length,span,flatness,and slope information.Subsequently,structured edge features are leveraged to filter candidate landing regions adaptively.By leveraging the three-dimensional information metric,the optimal landing region is accurately and efficiently identified.Finally,sparse two-dimensional key point is used to parameterize the optimal landing region for the first time and a high-precision relative pose estimation is achieved.Additional measurement information is introduced to provide the autonomous landing guidance information between the aircraft and the optimal landing region.Experimental results obtained from both synthetic and real data demonstrate the effectiveness of the proposed method in monocular pose estimation for autonomous aircraft landing guidance in unknown structured scenes.展开更多
In complex industrial scenes,it is difficult to acquire high-precision non-cooperative target pose under monocular visual servo control.This paper presents a new method of target extraction and high-precision edge fit...In complex industrial scenes,it is difficult to acquire high-precision non-cooperative target pose under monocular visual servo control.This paper presents a new method of target extraction and high-precision edge fitting for the wheel of the sintering trolley in steel production,which fuses multiple target extraction algorithms adapting to the working environment of the target.Firstly,based on obvious difference between the pixels of the target image and the non-target image in the gray histogram,these pixels were classified and then segmented in intraclass,removing interference factors and remaining the target image.Then,multiple segmentation results were merged and a final target image was obtained after small connected regions were eliminated.In the edge fitting stage,the edge fitting method with best-circumscribed rectangle was proposed to accurately fit the circular target edge.Finally,PnP algorithm was adopted for pose measurement of the target.The experimental results showed that the average estimation error of pose angleγwith respect to the z-axis rotation was 0.2346°,the average measurement error of pose angleαwith respect to the x-axis rotation was 0.1703°,and the average measurement error of pose angle β with respect to the y-axis rotation was 0.2275°.The proposed method has practical application value.展开更多
In recent years,Transformer has achieved remarkable results in the field of computer vision,with its built-in attention layers effectively modeling global dependencies in images by transforming image features into tok...In recent years,Transformer has achieved remarkable results in the field of computer vision,with its built-in attention layers effectively modeling global dependencies in images by transforming image features into token forms.However,Transformers often face high computational costs when processing large-scale image data,which limits their feasibility in real-time applications.To address this issue,we propose Token Masked Pose Transformers(TMPose),constructing an efficient Transformer network for pose estimation.This network applies semantic-level masking to tokens and employs three different masking strategies to optimize model performance,aiming to reduce computational complexity.Experimental results show that TMPose reduces computational complexity by 61.1%on the COCO validation dataset,with negligible loss in accuracy.Additionally,our performance on the MPII dataset is also competitive.This research not only enhances the accuracy of pose estimation but also significantly reduces the demand for computational resources,providing new directions for further studies in this field.展开更多
In order to address the challenges encountered in visual navigation for asteroid landing using traditional point features,such as significant recognition and extraction errors,low computational efficiency,and limited ...In order to address the challenges encountered in visual navigation for asteroid landing using traditional point features,such as significant recognition and extraction errors,low computational efficiency,and limited navigation accuracy,a novel approach for multi-type fusion visual navigation is proposed.This method aims to overcome the limitations of single-type features and enhance navigation accuracy.Analytical criteria for selecting multi-type features are introduced,which simultaneously improve computational efficiency and system navigation accuracy.Concerning pose estimation,both absolute and relative pose estimation methods based on multi-type feature fusion are proposed,and multi-type feature normalization is established,which significantly improves system navigation accuracy and lays the groundwork for flexible application of joint absolute-relative estimation.The feasibility and effectiveness of the proposed method are validated through simulation experiments through 4769 Castalia.展开更多
Passive optical motion capture technology is an effective mean to conduct high-precision pose estimation of small scenes of mobile robots;nevertheless,in the case of complex background and stray light interference in ...Passive optical motion capture technology is an effective mean to conduct high-precision pose estimation of small scenes of mobile robots;nevertheless,in the case of complex background and stray light interference in the scene,due to the infuence of target adhesion and environmental reflection,this technology cannot estimate the pose accurately.A passive binocular optical motion capture technology under complex illumination based on binocular camera and fixed retroreflective marker balls has been proposed.By fixing multiple hemispherical retrorefective marker balls on a rigid base,it uses binocular camera for depth estimation to obtain the fixed position relationship between the feature points.After performing unsupervised state estimation without manual operation,it overcomes the infuence of refection spots in the background.Meanwhile,contour extraction and ellipse least square fitting are used to extract the marker balls with incomplete shape as the feature points,so as to solve the problem of target adhesion in the scene.A FANUC m10i-a robot moving with 6-DOF is used for verification using the above methods in a complex lighting environment of a welding laboratory.The result shows that the average of absolute position errors is 5.793mm,the average of absolute rotation errors is 1.997°the average of relative position errors is 0.972 mm,and the average of relative rotation errors is 0.002°.Therefore,this technology meets the requirements of high-precision measurement in a complex lighting environment when estimating the 6-DOF-motion mobile robot and has very significant application prospects in complex scenes.展开更多
Advancements in animal behavior quantification methods have driven the development of computational ethology,enabling fully automated behavior analysis.Existing multianimal pose estimation workflows rely on tracking-b...Advancements in animal behavior quantification methods have driven the development of computational ethology,enabling fully automated behavior analysis.Existing multianimal pose estimation workflows rely on tracking-bydetection frameworks for either bottom-up or top-down approaches,requiring retraining to accommodate diverse animal appearances.This study introduces InteBOMB,an integrated workflow that enhances top-down approaches by incorporating generic object tracking,eliminating the need for prior knowledge of target animals while maintaining broad generalizability.InteBOMB includes two key strategies for tracking and segmentation in laboratory environments and two techniques for pose estimation in natural settings.The“background enhancement”strategy optimizesforeground-backgroundcontrastiveloss,generating more discriminative correlation maps.The“online proofreading”strategy stores human-in-the-loop long-term memory and dynamic short-term memory,enabling adaptive updates to object visual features.The“automated labeling suggestion”technique reuses the visual features saved during tracking to identify representative frames for training set labeling.Additionally,the“joint behavior analysis”technique integrates these features with multimodal data,expanding the latent space for behavior classification and clustering.To evaluate the framework,six datasets of mice and six datasets of nonhuman primates were compiled,covering laboratory and natural scenes.Benchmarking results demonstrated a24%improvement in zero-shot generic tracking and a 21%enhancement in joint latent space performance across datasets,highlighting the effectiveness of this approach in robust,generalizable behavior analysis.展开更多
Virtual maintenance,as an important means of industrial training and education,places strict requirements on the accuracy of participant pose perception and assessment of motion standardization.However,existing resear...Virtual maintenance,as an important means of industrial training and education,places strict requirements on the accuracy of participant pose perception and assessment of motion standardization.However,existing research mainly focuses on human pose estimation in general scenarios,lacking specialized solutions for maintenance scenarios.This paper proposes a virtual maintenance human pose estimation method based on multi-scale feature enhancement(VMHPE),which integrates adaptive input feature enhancement,multi-scale feature correction for improved expression of fine movements and complex poses,and multi-scale feature fusion to enhance keypoint localization accuracy.Meanwhile,this study constructs the first virtual maintenance-specific human keypoint dataset(VMHKP),which records standard action sequences of professional maintenance personnel in five typical maintenance tasks and provides a reliable benchmark for evaluating operator motion standardization.The dataset is publicly available at.Using high-precision keypoint prediction results,an action assessment system utilizing topological structure similarity was established.Experiments show that our method achieves significant performance improvements:average precision(AP)reaches 94.4%,an increase of 2.3 percentage points over baseline methods;average recall(AR)reaches 95.6%,an increase of 1.3 percentage points.This research establishes a scientific four-level evaluation standard based on comparative motion analysis and provides a reliable solution for standardizing industrial maintenance training.展开更多
文摘贵刊1995年第12期 p.30《怎样理解这三个句子》一文,读后颇受启发。但该文有一处注释写道:“suppose,supposing 引导条件状语从句时,仅用于问句。”笔者认为,这一说法欠妥。请看以下例证:Suppose white were black,you might be right.假如白的即是黑的,那末你或许就对了。(《英汉大词典》下卷 p.3490)Suppose(Supposing)you miss your tiger,he is not likely to miss you.你如果打不着老虎,老虎不见得吃不着你。(《英华大词典》修订第二版 p.1399)
基金supported in part by the National Natural Science Foundation of China under Grants 61973065,U20A20197,61973063.
文摘Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.
基金This research was supported by Grant Education Sciences Planning(JG10DB223)“Experimental research on the development of pupils’problem posing ability in Shenyang City”from the Research Fund of the Shenyang Educational Committeeby Grant GOA 2012/10“Number sense:Analysis and improvement”from the Research Fund of the Katholieke Universiteit Leuven,Belgium.
文摘The goal of the present study is to investigate the relationship between pupils’problem posing and problem solving abilities,their beliefs about problem posing and problem solving,and their general mathematics abilities,in a Chinese context.Five instruments,i.e.,a problem posing test,a problem solving test,a problem posing questionnaire,a problem solving questionnaire,and a standard achievement test,were administered to 69 Chinese fifth-grade pupils to assess these five variables and analyze their mutual relationships.Results revealed strong correlations between pupils’problem posing and problem solving abilities and beliefs,and their general mathematical abilities.
基金supported in part by the National Natural Science Foundation of China under Grants 62071345。
文摘Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.
基金co-supported by the National Natural Science Foundation of China(Nos.12302252 and 12472189)the Research Program of National University of Defense Technology,China(No.ZK24-31).
文摘Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.
基金National Natural Science Foundation of China(Grant No.62266045)National Science and Technology Major Project of China(No.2022YFE0138600)。
文摘This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation accuracy and robustness.We employ a manifold-based optimization approach,leveraging exponential and logarithmic mappings to transform rotation vectors into rotation matrices.The proposed ESKF framework ensures state variables remain near the origin,effectively mitigating singularity issues and enhancing numerical stability.Additionally,due to the small magnitude of state variables,second-order terms can be neglected,simplifying Jacobian matrix computation and improving computational efficiency.Furthermore,we introduce a novel Kalman filter gain computation strategy that dynamically adapts to low-dimensional and high-dimensional observation equations,enabling efficient processing across different sensor modalities.Specifically,for resource-constrained UAV platforms,this method significantly reduces computational cost,making it highly suitable for real-time UAV applications.
基金The Fundamental Research Funds for the Central Universities provided financial support for this research.
文摘Graph convolutional network(GCN)as an essential tool in human action recognition tasks have achieved excellent performance in previous studies.However,most current skeleton-based action recognition using GCN methods use a shared topology,which cannot flexibly adapt to the diverse correlations between joints under different motion features.The video-shooting angle or the occlusion of the body parts may bring about errors when extracting the human pose coordinates with estimation algorithms.In this work,we propose a novel graph convolutional learning framework,called PCCTR-GCN,which integrates pose correction and channel topology refinement for skeleton-based human action recognition.Firstly,a pose correction module(PCM)is introduced,which corrects the pose coordinates of the input network to reduce the error in pose feature extraction.Secondly,channel topology refinement graph convolution(CTR-GC)is employed,which can dynamically learn the topology features and aggregate joint features in different channel dimensions so as to enhance the performance of graph convolution networks in feature extraction.Finally,considering that the joint stream and bone stream of skeleton data and their dynamic information are also important for distinguishing different actions,we employ a multi-stream data fusion approach to improve the network’s recognition performance.We evaluate the model using top-1 and top-5 classification accuracy.On the benchmark datasets iMiGUE and Kinetics,the top-1 classification accuracy reaches 55.08%and 36.5%,respectively,while the top-5 classification accuracy reaches 89.98%and 59.2%,respectively.On the NTU dataset,for the two benchmark RGB+Dsettings(X-Sub and X-View),the classification accuracy achieves 89.7%and 95.4%,respectively.
文摘[Objective]Fish pose estimation(FPE)provides fish physiological information,facilitating health monitoring in aquaculture.It aids decision-making in areas such as fish behavior recognition.When fish are injured or deficient,they often display abnormal behaviors and noticeable changes in the positioning of their body parts.Moreover,the unpredictable posture and orientation of fish during swimming,combined with the rapid swimming speed of fish,restrict the current scope of research in FPE.In this research,a FPE model named HPFPE is presented to capture the swimming posture of fish and accurately detect their key points.[Methods]On the one hand,this model incorporated the CBAM module into the HRNet framework.The attention module enhanced accuracy without adding computational complexity,while effectively capturing a broader range of contextual information.On the other hand,the model incorporated dilated convolution to increase the receptive field,allowing it to capture more spatial context.[Results and Discussions]Experiments showed that compared with the baseline method,the average precision(AP)of HPFPE based on different backbones and input sizes on the oplegnathus punctatus datasets had increased by 0.62,1.35,1.76,and 1.28 percent point,respectively,while the average recall(AR)had also increased by 0.85,1.50,1.40,and 1.00,respectively.Additionally,HPFPE outperformed other mainstream methods,including DeepPose,CPM,SCNet,and Lite-HRNet.Furthermore,when compared to other methods using the ornamental fish data,HPFPE achieved the highest AP and AR values of 52.96%,and 59.50%,respectively.[Conclusions]The proposed HPFPE can accurately estimate fish posture and assess their swimming patterns,serving as a valuable reference for applications such as fish behavior recognition.
文摘Pose estimation of spacecraft targets is a key technology for achieving space operation tasks,such as the cleaning of failed satellites and the detection and scanning of non-cooperative targets.This paper reviews the target pose estimation methods based on image feature extraction and PnP,the target estimation methods based on registration,and the spacecraft target pose estimation methods based on deep learning,and introduces the corresponding research methods.
基金supported by the National Science Fund for Distinguished Young Scholars,China(No.51625501)Aeronautical Science Foundation of China(No.20240046051002)National Natural Science Foundation of China(No.52005028).
文摘Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.
基金supported by grants from the National Key Research and Development Program of China(2023YFF0724902)the China Postdoctoral Science Foundation(2020?M670027,2023TQ0183)the Local Standards Research of BeiJing Laboratory Tree Shrew(CHYX-2023-DGB001)。
文摘Background:Q uantifying the rich home-c age activities of tree shrews provides a reliable basis for understanding their daily routines and building disease models.However,due to the lack of effective behavioral methods,most efforts on tree shrew behavior are limited to simple measures,resulting in the loss of much behavioral information.Methods:T o address this issue,we present a deep learning(DL)approach to achieve markerless pose estimation and recognize multiple spontaneous behaviors of tree shrews,including drinking,eating,resting,and staying in the dark house,etc.Results:T his high-t hroughput approach can monitor the home-cage activities of 16 tree shrews simultaneously over an extended period.Additionally,we demonstrated an innovative system with reliable apparatus,paradigms,and analysis methods for investigating food grasping behavior.The median duration for each bout of grasping was 0.20 s.Conclusion:T his study provides an efficient tool for quantifying and understand tree shrews'natural behaviors.
基金co-supported by the Science and Technology Innovation Program of Hunan Province,China(No.2023RC3023)the National Natural Science Foundation of China(No.12272404)。
文摘The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative pose estimation.This study proposes a novel airborne monocular visual estimation method based on structured scene features to address this challenge.First,a multitask neural network model is established for segmentation,depth estimation,and slope estimation on monocular images.And a monocular image comprehensive three-dimensional information metric is designed,encompassing length,span,flatness,and slope information.Subsequently,structured edge features are leveraged to filter candidate landing regions adaptively.By leveraging the three-dimensional information metric,the optimal landing region is accurately and efficiently identified.Finally,sparse two-dimensional key point is used to parameterize the optimal landing region for the first time and a high-precision relative pose estimation is achieved.Additional measurement information is introduced to provide the autonomous landing guidance information between the aircraft and the optimal landing region.Experimental results obtained from both synthetic and real data demonstrate the effectiveness of the proposed method in monocular pose estimation for autonomous aircraft landing guidance in unknown structured scenes.
基金supported by Key Research and Development Projects in Shaanxi Province (No. 2021GY-265)Xi’an University Talent Service Enterprise Project (No.2020KJRC0049)。
文摘In complex industrial scenes,it is difficult to acquire high-precision non-cooperative target pose under monocular visual servo control.This paper presents a new method of target extraction and high-precision edge fitting for the wheel of the sintering trolley in steel production,which fuses multiple target extraction algorithms adapting to the working environment of the target.Firstly,based on obvious difference between the pixels of the target image and the non-target image in the gray histogram,these pixels were classified and then segmented in intraclass,removing interference factors and remaining the target image.Then,multiple segmentation results were merged and a final target image was obtained after small connected regions were eliminated.In the edge fitting stage,the edge fitting method with best-circumscribed rectangle was proposed to accurately fit the circular target edge.Finally,PnP algorithm was adopted for pose measurement of the target.The experimental results showed that the average estimation error of pose angleγwith respect to the z-axis rotation was 0.2346°,the average measurement error of pose angleαwith respect to the x-axis rotation was 0.1703°,and the average measurement error of pose angle β with respect to the y-axis rotation was 0.2275°.The proposed method has practical application value.
基金supported in part by the Scientific Research Start-Up Fund of Zhejiang Sci-Tech University,under the project titled“(National Treasury)Development of a Digital Silk Museum System Based on Metaverse and AR”(Project No.11121731282202-01).
文摘In recent years,Transformer has achieved remarkable results in the field of computer vision,with its built-in attention layers effectively modeling global dependencies in images by transforming image features into token forms.However,Transformers often face high computational costs when processing large-scale image data,which limits their feasibility in real-time applications.To address this issue,we propose Token Masked Pose Transformers(TMPose),constructing an efficient Transformer network for pose estimation.This network applies semantic-level masking to tokens and employs three different masking strategies to optimize model performance,aiming to reduce computational complexity.Experimental results show that TMPose reduces computational complexity by 61.1%on the COCO validation dataset,with negligible loss in accuracy.Additionally,our performance on the MPII dataset is also competitive.This research not only enhances the accuracy of pose estimation but also significantly reduces the demand for computational resources,providing new directions for further studies in this field.
基金supported by the National Natural Science Foundation of China(No.U2037602)。
文摘In order to address the challenges encountered in visual navigation for asteroid landing using traditional point features,such as significant recognition and extraction errors,low computational efficiency,and limited navigation accuracy,a novel approach for multi-type fusion visual navigation is proposed.This method aims to overcome the limitations of single-type features and enhance navigation accuracy.Analytical criteria for selecting multi-type features are introduced,which simultaneously improve computational efficiency and system navigation accuracy.Concerning pose estimation,both absolute and relative pose estimation methods based on multi-type feature fusion are proposed,and multi-type feature normalization is established,which significantly improves system navigation accuracy and lays the groundwork for flexible application of joint absolute-relative estimation.The feasibility and effectiveness of the proposed method are validated through simulation experiments through 4769 Castalia.
基金the National Key Research and Development Program of China(No.2018YFB1305005)。
文摘Passive optical motion capture technology is an effective mean to conduct high-precision pose estimation of small scenes of mobile robots;nevertheless,in the case of complex background and stray light interference in the scene,due to the infuence of target adhesion and environmental reflection,this technology cannot estimate the pose accurately.A passive binocular optical motion capture technology under complex illumination based on binocular camera and fixed retroreflective marker balls has been proposed.By fixing multiple hemispherical retrorefective marker balls on a rigid base,it uses binocular camera for depth estimation to obtain the fixed position relationship between the feature points.After performing unsupervised state estimation without manual operation,it overcomes the infuence of refection spots in the background.Meanwhile,contour extraction and ellipse least square fitting are used to extract the marker balls with incomplete shape as the feature points,so as to solve the problem of target adhesion in the scene.A FANUC m10i-a robot moving with 6-DOF is used for verification using the above methods in a complex lighting environment of a welding laboratory.The result shows that the average of absolute position errors is 5.793mm,the average of absolute rotation errors is 1.997°the average of relative position errors is 0.972 mm,and the average of relative rotation errors is 0.002°.Therefore,this technology meets the requirements of high-precision measurement in a complex lighting environment when estimating the 6-DOF-motion mobile robot and has very significant application prospects in complex scenes.
基金supported by the STI 2030-Major Projects(2022ZD0211900,2022ZD0211902)STI 2030-Major Projects(2021ZD0204500,2021ZD0204503)+1 种基金National Natural Science Foundation of China(32171461)National Key Research and Development Program of China(2023YFC3208303)。
文摘Advancements in animal behavior quantification methods have driven the development of computational ethology,enabling fully automated behavior analysis.Existing multianimal pose estimation workflows rely on tracking-bydetection frameworks for either bottom-up or top-down approaches,requiring retraining to accommodate diverse animal appearances.This study introduces InteBOMB,an integrated workflow that enhances top-down approaches by incorporating generic object tracking,eliminating the need for prior knowledge of target animals while maintaining broad generalizability.InteBOMB includes two key strategies for tracking and segmentation in laboratory environments and two techniques for pose estimation in natural settings.The“background enhancement”strategy optimizesforeground-backgroundcontrastiveloss,generating more discriminative correlation maps.The“online proofreading”strategy stores human-in-the-loop long-term memory and dynamic short-term memory,enabling adaptive updates to object visual features.The“automated labeling suggestion”technique reuses the visual features saved during tracking to identify representative frames for training set labeling.Additionally,the“joint behavior analysis”technique integrates these features with multimodal data,expanding the latent space for behavior classification and clustering.To evaluate the framework,six datasets of mice and six datasets of nonhuman primates were compiled,covering laboratory and natural scenes.Benchmarking results demonstrated a24%improvement in zero-shot generic tracking and a 21%enhancement in joint latent space performance across datasets,highlighting the effectiveness of this approach in robust,generalizable behavior analysis.
基金funded by the Joint Development Project with Pharmapack Technologies Corporation:Open Multi-Person Collaborative Virtual Assembly/Disassembly Training and Virtual Engineering Visualization Platform,Grant Number 23HK0101.
文摘Virtual maintenance,as an important means of industrial training and education,places strict requirements on the accuracy of participant pose perception and assessment of motion standardization.However,existing research mainly focuses on human pose estimation in general scenarios,lacking specialized solutions for maintenance scenarios.This paper proposes a virtual maintenance human pose estimation method based on multi-scale feature enhancement(VMHPE),which integrates adaptive input feature enhancement,multi-scale feature correction for improved expression of fine movements and complex poses,and multi-scale feature fusion to enhance keypoint localization accuracy.Meanwhile,this study constructs the first virtual maintenance-specific human keypoint dataset(VMHKP),which records standard action sequences of professional maintenance personnel in five typical maintenance tasks and provides a reliable benchmark for evaluating operator motion standardization.The dataset is publicly available at.Using high-precision keypoint prediction results,an action assessment system utilizing topological structure similarity was established.Experiments show that our method achieves significant performance improvements:average precision(AP)reaches 94.4%,an increase of 2.3 percentage points over baseline methods;average recall(AR)reaches 95.6%,an increase of 1.3 percentage points.This research establishes a scientific four-level evaluation standard based on comparative motion analysis and provides a reliable solution for standardizing industrial maintenance training.