Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton s...Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.展开更多
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su...Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.展开更多
This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation ...This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation accuracy and robustness.We employ a manifold-based optimization approach,leveraging exponential and logarithmic mappings to transform rotation vectors into rotation matrices.The proposed ESKF framework ensures state variables remain near the origin,effectively mitigating singularity issues and enhancing numerical stability.Additionally,due to the small magnitude of state variables,second-order terms can be neglected,simplifying Jacobian matrix computation and improving computational efficiency.Furthermore,we introduce a novel Kalman filter gain computation strategy that dynamically adapts to low-dimensional and high-dimensional observation equations,enabling efficient processing across different sensor modalities.Specifically,for resource-constrained UAV platforms,this method significantly reduces computational cost,making it highly suitable for real-time UAV applications.展开更多
Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onbo...Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.展开更多
Pose estimation of spacecraft targets is a key technology for achieving space operation tasks,such as the cleaning of failed satellites and the detection and scanning of non-cooperative targets.This paper reviews the ...Pose estimation of spacecraft targets is a key technology for achieving space operation tasks,such as the cleaning of failed satellites and the detection and scanning of non-cooperative targets.This paper reviews the target pose estimation methods based on image feature extraction and PnP,the target estimation methods based on registration,and the spacecraft target pose estimation methods based on deep learning,and introduces the corresponding research methods.展开更多
[Objective]Fish pose estimation(FPE)provides fish physiological information,facilitating health monitoring in aquaculture.It aids decision-making in areas such as fish behavior recognition.When fish are injured or def...[Objective]Fish pose estimation(FPE)provides fish physiological information,facilitating health monitoring in aquaculture.It aids decision-making in areas such as fish behavior recognition.When fish are injured or deficient,they often display abnormal behaviors and noticeable changes in the positioning of their body parts.Moreover,the unpredictable posture and orientation of fish during swimming,combined with the rapid swimming speed of fish,restrict the current scope of research in FPE.In this research,a FPE model named HPFPE is presented to capture the swimming posture of fish and accurately detect their key points.[Methods]On the one hand,this model incorporated the CBAM module into the HRNet framework.The attention module enhanced accuracy without adding computational complexity,while effectively capturing a broader range of contextual information.On the other hand,the model incorporated dilated convolution to increase the receptive field,allowing it to capture more spatial context.[Results and Discussions]Experiments showed that compared with the baseline method,the average precision(AP)of HPFPE based on different backbones and input sizes on the oplegnathus punctatus datasets had increased by 0.62,1.35,1.76,and 1.28 percent point,respectively,while the average recall(AR)had also increased by 0.85,1.50,1.40,and 1.00,respectively.Additionally,HPFPE outperformed other mainstream methods,including DeepPose,CPM,SCNet,and Lite-HRNet.Furthermore,when compared to other methods using the ornamental fish data,HPFPE achieved the highest AP and AR values of 52.96%,and 59.50%,respectively.[Conclusions]The proposed HPFPE can accurately estimate fish posture and assess their swimming patterns,serving as a valuable reference for applications such as fish behavior recognition.展开更多
Background:Q uantifying the rich home-c age activities of tree shrews provides a reliable basis for understanding their daily routines and building disease models.However,due to the lack of effective behavioral method...Background:Q uantifying the rich home-c age activities of tree shrews provides a reliable basis for understanding their daily routines and building disease models.However,due to the lack of effective behavioral methods,most efforts on tree shrew behavior are limited to simple measures,resulting in the loss of much behavioral information.Methods:T o address this issue,we present a deep learning(DL)approach to achieve markerless pose estimation and recognize multiple spontaneous behaviors of tree shrews,including drinking,eating,resting,and staying in the dark house,etc.Results:T his high-t hroughput approach can monitor the home-cage activities of 16 tree shrews simultaneously over an extended period.Additionally,we demonstrated an innovative system with reliable apparatus,paradigms,and analysis methods for investigating food grasping behavior.The median duration for each bout of grasping was 0.20 s.Conclusion:T his study provides an efficient tool for quantifying and understand tree shrews'natural behaviors.展开更多
Virtual maintenance,as an important means of industrial training and education,places strict requirements on the accuracy of participant pose perception and assessment of motion standardization.However,existing resear...Virtual maintenance,as an important means of industrial training and education,places strict requirements on the accuracy of participant pose perception and assessment of motion standardization.However,existing research mainly focuses on human pose estimation in general scenarios,lacking specialized solutions for maintenance scenarios.This paper proposes a virtual maintenance human pose estimation method based on multi-scale feature enhancement(VMHPE),which integrates adaptive input feature enhancement,multi-scale feature correction for improved expression of fine movements and complex poses,and multi-scale feature fusion to enhance keypoint localization accuracy.Meanwhile,this study constructs the first virtual maintenance-specific human keypoint dataset(VMHKP),which records standard action sequences of professional maintenance personnel in five typical maintenance tasks and provides a reliable benchmark for evaluating operator motion standardization.The dataset is publicly available at.Using high-precision keypoint prediction results,an action assessment system utilizing topological structure similarity was established.Experiments show that our method achieves significant performance improvements:average precision(AP)reaches 94.4%,an increase of 2.3 percentage points over baseline methods;average recall(AR)reaches 95.6%,an increase of 1.3 percentage points.This research establishes a scientific four-level evaluation standard based on comparative motion analysis and provides a reliable solution for standardizing industrial maintenance training.展开更多
Advancements in animal behavior quantification methods have driven the development of computational ethology,enabling fully automated behavior analysis.Existing multianimal pose estimation workflows rely on tracking-b...Advancements in animal behavior quantification methods have driven the development of computational ethology,enabling fully automated behavior analysis.Existing multianimal pose estimation workflows rely on tracking-bydetection frameworks for either bottom-up or top-down approaches,requiring retraining to accommodate diverse animal appearances.This study introduces InteBOMB,an integrated workflow that enhances top-down approaches by incorporating generic object tracking,eliminating the need for prior knowledge of target animals while maintaining broad generalizability.InteBOMB includes two key strategies for tracking and segmentation in laboratory environments and two techniques for pose estimation in natural settings.The“background enhancement”strategy optimizesforeground-backgroundcontrastiveloss,generating more discriminative correlation maps.The“online proofreading”strategy stores human-in-the-loop long-term memory and dynamic short-term memory,enabling adaptive updates to object visual features.The“automated labeling suggestion”technique reuses the visual features saved during tracking to identify representative frames for training set labeling.Additionally,the“joint behavior analysis”technique integrates these features with multimodal data,expanding the latent space for behavior classification and clustering.To evaluate the framework,six datasets of mice and six datasets of nonhuman primates were compiled,covering laboratory and natural scenes.Benchmarking results demonstrated a24%improvement in zero-shot generic tracking and a 21%enhancement in joint latent space performance across datasets,highlighting the effectiveness of this approach in robust,generalizable behavior analysis.展开更多
Real-time 6 Degree-of-Freedom(DoF)pose estimation is of paramount importance for various on-orbit tasks.Benefiting from the development of deep learning,Convolutional Neural Networks(CNNs)in feature extraction has yie...Real-time 6 Degree-of-Freedom(DoF)pose estimation is of paramount importance for various on-orbit tasks.Benefiting from the development of deep learning,Convolutional Neural Networks(CNNs)in feature extraction has yielded impressive achievements for spacecraft pose estimation.To improve the robustness and interpretability of CNNs,this paper proposes a Pose Estimation approach based on Variational Auto-Encoder structure(PE-VAE)and a Feature-Aided pose estimation approach based on Variational Auto-Encoder structure(FA-VAE),which aim to accurately estimate the 6 DoF pose of a target spacecraft.Both methods treat the pose vector as latent variables,employing an encoder-decoder network with a Variational Auto-Encoder(VAE)structure.To enhance the precision of pose estimation,PE-VAE uses the VAE structure to introduce reconstruction mechanism with the whole image.Furthermore,FA-VAE enforces feature shape constraints by exclusively reconstructing the segment of the target spacecraft with the desired shape.Comparative evaluation against leading methods on public datasets reveals similar accuracy with a threefold improvement in processing speed,showcasing the significant contribution of VAE structures to accuracy enhancement,and the additional benefit of incorporating global shape prior features.展开更多
Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,huma...Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.展开更多
Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(...Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(LPEN)to efficiently and effectively detect abnormal fall-down and trespass incidents in electric power training scenarios.The LPEN network,comprising three stages—MobileNet,Initial Stage,and Refinement Stage—is employed to swiftly extract image features,detect human key points,and refine them for accurate analysis.Subsequently,a Pose-aware Action Analysis Module(PAAM)captures the positional coordinates of human skeletal points in each frame.Finally,an Abnormal Action Inference Module(AAIM)evaluates whether abnormal fall-down or unauthorized trespass behavior is occurring.For fall-down recognition,three criteria—falling speed,main angles of skeletal points,and the person’s bounding box—are considered.To identify unauthorized trespass,emphasis is placed on the position of the ankles.Extensive experiments validate the effectiveness and efficiency of the proposed system in ensuring the safety and reliability of electric power training.展开更多
Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we...Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.展开更多
Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely u...Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.展开更多
Identifying workers’construction activities or behaviors can enable managers to better monitor labor efficiency and construction progress.However,current activity analysis methods for construction workers rely solely...Identifying workers’construction activities or behaviors can enable managers to better monitor labor efficiency and construction progress.However,current activity analysis methods for construction workers rely solely on manual observations and recordings,which consumes considerable time and has high labor costs.Researchers have focused on monitoring on-site construction activities of workers.However,when multiple workers are working together,current research cannot accu rately and automatically identify the construction activity.This research proposes a deep learning framework for the automated analysis of the construction activities of multiple workers.In this framework,multiple deep neural network models are designed and used to complete worker key point extraction,worker tracking,and worker construction activity analysis.The designed framework was tested at an actual construction site,and activity recognition for multiple workers was performed,indicating the feasibility of the framework for the automated monitoring of work efficiency.展开更多
The lack of autonomous aerial refueling capabilities is one of the greatest limitations of unmanned aerial vehicles. This paper discusses the vision-based estimation of the relative pose of a tanker and unmanned aeria...The lack of autonomous aerial refueling capabilities is one of the greatest limitations of unmanned aerial vehicles. This paper discusses the vision-based estimation of the relative pose of a tanker and unmanned aerial vehicle, which is a key issue in autonomous aerial refueling. The main task of this paper is to study the relative pose estimation for a tanker and unmanned aerial vehicle in the phase of commencing refueling and during refueling. The employed algorithm includes the initialization of the orientation parameters and an orthogonal iteration algorithm to estimate the optimal solution of rotation matrix and translation vector. In simulation experiments, because of the small variation in the rotation angle in aerial refueling, the method in which the initial rotation matrix is the identity matrix is found to be the most stable and accurate among methods. Finally, the paper discusses the effects of the number and configuration of feature points on the accuracy of the estimation results when using this method.展开更多
A multi-residual module stacked hourglass network(MRSH)was proposed to improve the accuracy and robustness of human body pose estimation.The network uses multiple hourglass sub-networks and three new residual modules....A multi-residual module stacked hourglass network(MRSH)was proposed to improve the accuracy and robustness of human body pose estimation.The network uses multiple hourglass sub-networks and three new residual modules.In the hourglass sub-network,the large receptive field residual module(LRFRM)and the multi-scale residual module(MSRM)are first used to learn the spatial relationship between features and body parts at various scales.Only the improved residual module(IRM)is used when the resolution is minimized.The final network uses four stacked hourglass sub-networks,with intermediate supervision at the end of each hourglass,repeating high-low(from high resolution to low resolution)and low-high(from low resolution to high resolution)learning.The network was tested on the public datasets of Leeds sports poses(LSP)and MPII human pose.The experimental results show that the proposed network has better performance in human pose estimation.展开更多
Error or drift is frequently produced in pose estimation based on geometric"feature detection and tracking"monocular visual odometry(VO)when the speed of camera movement exceeds 1.5 m/s.While,in most VO meth...Error or drift is frequently produced in pose estimation based on geometric"feature detection and tracking"monocular visual odometry(VO)when the speed of camera movement exceeds 1.5 m/s.While,in most VO methods based on deep learning,weight factors are in the form of fixed values,which are easy to lead to overfitting.A new measurement system,for monocular visual odometry,named Deep Learning Visual Odometry(DLVO),is proposed based on neural network.In this system,Convolutional Neural Network(CNN)is used to extract feature and perform feature matching.Moreover,Recurrent Neural Network(RNN)is used for sequence modeling to estimate camera’s 6-dof poses.Instead of fixed weight values of CNN,Bayesian distribution of weight factors are introduced in order to effectively solve the problem of network overfitting.The 18,726 frame images in KITTI dataset are used for training network.This system can increase the generalization ability of network model in prediction process.Compared with original Recurrent Convolutional Neural Network(RCNN),our method can reduce the loss of test model by 5.33%.And it’s an effective method in improving the robustness of translation and rotation information than traditional VO methods.展开更多
基金supported in part by the National Natural Science Foundation of China under Grants 61973065,U20A20197,61973063.
文摘Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.
基金supported in part by the National Natural Science Foundation of China under Grants 62071345。
文摘Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.
基金National Natural Science Foundation of China(Grant No.62266045)National Science and Technology Major Project of China(No.2022YFE0138600)。
文摘This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation accuracy and robustness.We employ a manifold-based optimization approach,leveraging exponential and logarithmic mappings to transform rotation vectors into rotation matrices.The proposed ESKF framework ensures state variables remain near the origin,effectively mitigating singularity issues and enhancing numerical stability.Additionally,due to the small magnitude of state variables,second-order terms can be neglected,simplifying Jacobian matrix computation and improving computational efficiency.Furthermore,we introduce a novel Kalman filter gain computation strategy that dynamically adapts to low-dimensional and high-dimensional observation equations,enabling efficient processing across different sensor modalities.Specifically,for resource-constrained UAV platforms,this method significantly reduces computational cost,making it highly suitable for real-time UAV applications.
基金co-supported by the National Natural Science Foundation of China(Nos.12302252 and 12472189)the Research Program of National University of Defense Technology,China(No.ZK24-31).
文摘Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.
文摘Pose estimation of spacecraft targets is a key technology for achieving space operation tasks,such as the cleaning of failed satellites and the detection and scanning of non-cooperative targets.This paper reviews the target pose estimation methods based on image feature extraction and PnP,the target estimation methods based on registration,and the spacecraft target pose estimation methods based on deep learning,and introduces the corresponding research methods.
文摘[Objective]Fish pose estimation(FPE)provides fish physiological information,facilitating health monitoring in aquaculture.It aids decision-making in areas such as fish behavior recognition.When fish are injured or deficient,they often display abnormal behaviors and noticeable changes in the positioning of their body parts.Moreover,the unpredictable posture and orientation of fish during swimming,combined with the rapid swimming speed of fish,restrict the current scope of research in FPE.In this research,a FPE model named HPFPE is presented to capture the swimming posture of fish and accurately detect their key points.[Methods]On the one hand,this model incorporated the CBAM module into the HRNet framework.The attention module enhanced accuracy without adding computational complexity,while effectively capturing a broader range of contextual information.On the other hand,the model incorporated dilated convolution to increase the receptive field,allowing it to capture more spatial context.[Results and Discussions]Experiments showed that compared with the baseline method,the average precision(AP)of HPFPE based on different backbones and input sizes on the oplegnathus punctatus datasets had increased by 0.62,1.35,1.76,and 1.28 percent point,respectively,while the average recall(AR)had also increased by 0.85,1.50,1.40,and 1.00,respectively.Additionally,HPFPE outperformed other mainstream methods,including DeepPose,CPM,SCNet,and Lite-HRNet.Furthermore,when compared to other methods using the ornamental fish data,HPFPE achieved the highest AP and AR values of 52.96%,and 59.50%,respectively.[Conclusions]The proposed HPFPE can accurately estimate fish posture and assess their swimming patterns,serving as a valuable reference for applications such as fish behavior recognition.
基金supported by grants from the National Key Research and Development Program of China(2023YFF0724902)the China Postdoctoral Science Foundation(2020?M670027,2023TQ0183)the Local Standards Research of BeiJing Laboratory Tree Shrew(CHYX-2023-DGB001)。
文摘Background:Q uantifying the rich home-c age activities of tree shrews provides a reliable basis for understanding their daily routines and building disease models.However,due to the lack of effective behavioral methods,most efforts on tree shrew behavior are limited to simple measures,resulting in the loss of much behavioral information.Methods:T o address this issue,we present a deep learning(DL)approach to achieve markerless pose estimation and recognize multiple spontaneous behaviors of tree shrews,including drinking,eating,resting,and staying in the dark house,etc.Results:T his high-t hroughput approach can monitor the home-cage activities of 16 tree shrews simultaneously over an extended period.Additionally,we demonstrated an innovative system with reliable apparatus,paradigms,and analysis methods for investigating food grasping behavior.The median duration for each bout of grasping was 0.20 s.Conclusion:T his study provides an efficient tool for quantifying and understand tree shrews'natural behaviors.
基金funded by the Joint Development Project with Pharmapack Technologies Corporation:Open Multi-Person Collaborative Virtual Assembly/Disassembly Training and Virtual Engineering Visualization Platform,Grant Number 23HK0101.
文摘Virtual maintenance,as an important means of industrial training and education,places strict requirements on the accuracy of participant pose perception and assessment of motion standardization.However,existing research mainly focuses on human pose estimation in general scenarios,lacking specialized solutions for maintenance scenarios.This paper proposes a virtual maintenance human pose estimation method based on multi-scale feature enhancement(VMHPE),which integrates adaptive input feature enhancement,multi-scale feature correction for improved expression of fine movements and complex poses,and multi-scale feature fusion to enhance keypoint localization accuracy.Meanwhile,this study constructs the first virtual maintenance-specific human keypoint dataset(VMHKP),which records standard action sequences of professional maintenance personnel in five typical maintenance tasks and provides a reliable benchmark for evaluating operator motion standardization.The dataset is publicly available at.Using high-precision keypoint prediction results,an action assessment system utilizing topological structure similarity was established.Experiments show that our method achieves significant performance improvements:average precision(AP)reaches 94.4%,an increase of 2.3 percentage points over baseline methods;average recall(AR)reaches 95.6%,an increase of 1.3 percentage points.This research establishes a scientific four-level evaluation standard based on comparative motion analysis and provides a reliable solution for standardizing industrial maintenance training.
基金supported by the STI 2030-Major Projects(2022ZD0211900,2022ZD0211902)STI 2030-Major Projects(2021ZD0204500,2021ZD0204503)+1 种基金National Natural Science Foundation of China(32171461)National Key Research and Development Program of China(2023YFC3208303)。
文摘Advancements in animal behavior quantification methods have driven the development of computational ethology,enabling fully automated behavior analysis.Existing multianimal pose estimation workflows rely on tracking-bydetection frameworks for either bottom-up or top-down approaches,requiring retraining to accommodate diverse animal appearances.This study introduces InteBOMB,an integrated workflow that enhances top-down approaches by incorporating generic object tracking,eliminating the need for prior knowledge of target animals while maintaining broad generalizability.InteBOMB includes two key strategies for tracking and segmentation in laboratory environments and two techniques for pose estimation in natural settings.The“background enhancement”strategy optimizesforeground-backgroundcontrastiveloss,generating more discriminative correlation maps.The“online proofreading”strategy stores human-in-the-loop long-term memory and dynamic short-term memory,enabling adaptive updates to object visual features.The“automated labeling suggestion”technique reuses the visual features saved during tracking to identify representative frames for training set labeling.Additionally,the“joint behavior analysis”technique integrates these features with multimodal data,expanding the latent space for behavior classification and clustering.To evaluate the framework,six datasets of mice and six datasets of nonhuman primates were compiled,covering laboratory and natural scenes.Benchmarking results demonstrated a24%improvement in zero-shot generic tracking and a 21%enhancement in joint latent space performance across datasets,highlighting the effectiveness of this approach in robust,generalizable behavior analysis.
基金supported by the National Natural Science Foundation of China(No.52272390)the Natural Science Foundation of Heilongjiang Province of China(No.YQ2022A009)the Shanghai Sailing Program,China(No.20YF1417300).
文摘Real-time 6 Degree-of-Freedom(DoF)pose estimation is of paramount importance for various on-orbit tasks.Benefiting from the development of deep learning,Convolutional Neural Networks(CNNs)in feature extraction has yielded impressive achievements for spacecraft pose estimation.To improve the robustness and interpretability of CNNs,this paper proposes a Pose Estimation approach based on Variational Auto-Encoder structure(PE-VAE)and a Feature-Aided pose estimation approach based on Variational Auto-Encoder structure(FA-VAE),which aim to accurately estimate the 6 DoF pose of a target spacecraft.Both methods treat the pose vector as latent variables,employing an encoder-decoder network with a Variational Auto-Encoder(VAE)structure.To enhance the precision of pose estimation,PE-VAE uses the VAE structure to introduce reconstruction mechanism with the whole image.Furthermore,FA-VAE enforces feature shape constraints by exclusively reconstructing the segment of the target spacecraft with the desired shape.Comparative evaluation against leading methods on public datasets reveals similar accuracy with a threefold improvement in processing speed,showcasing the significant contribution of VAE structures to accuracy enhancement,and the additional benefit of incorporating global shape prior features.
基金the National Natural Science Foundation of China(Grant Number 62076246).
文摘Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.
基金supportted by Natural Science Foundation of Jiangsu Province(No.BK20230696).
文摘Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(LPEN)to efficiently and effectively detect abnormal fall-down and trespass incidents in electric power training scenarios.The LPEN network,comprising three stages—MobileNet,Initial Stage,and Refinement Stage—is employed to swiftly extract image features,detect human key points,and refine them for accurate analysis.Subsequently,a Pose-aware Action Analysis Module(PAAM)captures the positional coordinates of human skeletal points in each frame.Finally,an Abnormal Action Inference Module(AAIM)evaluates whether abnormal fall-down or unauthorized trespass behavior is occurring.For fall-down recognition,three criteria—falling speed,main angles of skeletal points,and the person’s bounding box—are considered.To identify unauthorized trespass,emphasis is placed on the position of the ankles.Extensive experiments validate the effectiveness and efficiency of the proposed system in ensuring the safety and reliability of electric power training.
基金supported by the Natural Science Foundation of Hubei Province of China under grant number 2022CFB536the National Natural Science Foundation of China under grant number 62367006the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology under grant number CX2023579.
文摘Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.
基金This work was supported by grants fromthe Natural Science Foundation of Hebei Province,under Grant No.F2021202021the S&T Program of Hebei,under Grant No.22375001Dthe National Key R&D Program of China,under Grant No.2019YFB1312500.
文摘Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.
基金supported by the National Natural Science Foundation of China(52130801,U20A20312,52178271,and 52077213)the National Key Research and Development Program of China(2021YFF0500903)。
文摘Identifying workers’construction activities or behaviors can enable managers to better monitor labor efficiency and construction progress.However,current activity analysis methods for construction workers rely solely on manual observations and recordings,which consumes considerable time and has high labor costs.Researchers have focused on monitoring on-site construction activities of workers.However,when multiple workers are working together,current research cannot accu rately and automatically identify the construction activity.This research proposes a deep learning framework for the automated analysis of the construction activities of multiple workers.In this framework,multiple deep neural network models are designed and used to complete worker key point extraction,worker tracking,and worker construction activity analysis.The designed framework was tested at an actual construction site,and activity recognition for multiple workers was performed,indicating the feasibility of the framework for the automated monitoring of work efficiency.
基金National Natural Science Foundation of China (51075207) Startup Foundation for Introduced Talents of Nanjing University of Aeronautics and Astronautics (1007-YAH10047)
文摘The lack of autonomous aerial refueling capabilities is one of the greatest limitations of unmanned aerial vehicles. This paper discusses the vision-based estimation of the relative pose of a tanker and unmanned aerial vehicle, which is a key issue in autonomous aerial refueling. The main task of this paper is to study the relative pose estimation for a tanker and unmanned aerial vehicle in the phase of commencing refueling and during refueling. The employed algorithm includes the initialization of the orientation parameters and an orthogonal iteration algorithm to estimate the optimal solution of rotation matrix and translation vector. In simulation experiments, because of the small variation in the rotation angle in aerial refueling, the method in which the initial rotation matrix is the identity matrix is found to be the most stable and accurate among methods. Finally, the paper discusses the effects of the number and configuration of feature points on the accuracy of the estimation results when using this method.
基金Supported by the National Natural Science Foundation of China(61401001,61501003,61672032)。
文摘A multi-residual module stacked hourglass network(MRSH)was proposed to improve the accuracy and robustness of human body pose estimation.The network uses multiple hourglass sub-networks and three new residual modules.In the hourglass sub-network,the large receptive field residual module(LRFRM)and the multi-scale residual module(MSRM)are first used to learn the spatial relationship between features and body parts at various scales.Only the improved residual module(IRM)is used when the resolution is minimized.The final network uses four stacked hourglass sub-networks,with intermediate supervision at the end of each hourglass,repeating high-low(from high resolution to low resolution)and low-high(from low resolution to high resolution)learning.The network was tested on the public datasets of Leeds sports poses(LSP)and MPII human pose.The experimental results show that the proposed network has better performance in human pose estimation.
基金supported by National Key R&D Plan(2017YFB1301104),NSFC(61877040,61772351)Sci-Tech Innovation Fundamental Scientific Research Funds(025195305000)(19210010005),academy for multidisciplinary study of Capital Normal University。
文摘Error or drift is frequently produced in pose estimation based on geometric"feature detection and tracking"monocular visual odometry(VO)when the speed of camera movement exceeds 1.5 m/s.While,in most VO methods based on deep learning,weight factors are in the form of fixed values,which are easy to lead to overfitting.A new measurement system,for monocular visual odometry,named Deep Learning Visual Odometry(DLVO),is proposed based on neural network.In this system,Convolutional Neural Network(CNN)is used to extract feature and perform feature matching.Moreover,Recurrent Neural Network(RNN)is used for sequence modeling to estimate camera’s 6-dof poses.Instead of fixed weight values of CNN,Bayesian distribution of weight factors are introduced in order to effectively solve the problem of network overfitting.The 18,726 frame images in KITTI dataset are used for training network.This system can increase the generalization ability of network model in prediction process.Compared with original Recurrent Convolutional Neural Network(RCNN),our method can reduce the loss of test model by 5.33%.And it’s an effective method in improving the robustness of translation and rotation information than traditional VO methods.