贵刊1995年第12期 p.30《怎样理解这三个句子》一文,读后颇受启发。但该文有一处注释写道:“suppose,supposing 引导条件状语从句时,仅用于问句。”笔者认为,这一说法欠妥。请看以下例证:Suppose white were black,you might be right....贵刊1995年第12期 p.30《怎样理解这三个句子》一文,读后颇受启发。但该文有一处注释写道:“suppose,supposing 引导条件状语从句时,仅用于问句。”笔者认为,这一说法欠妥。请看以下例证:Suppose white were black,you might be right.假如白的即是黑的,那末你或许就对了。(《英汉大词典》下卷 p.3490)Suppose(Supposing)you miss your tiger,he is not likely to miss you.你如果打不着老虎,老虎不见得吃不着你。(《英华大词典》修订第二版 p.1399)展开更多
The goal of the present study is to investigate the relationship between pupils’problem posing and problem solving abilities,their beliefs about problem posing and problem solving,and their general mathematics abilit...The goal of the present study is to investigate the relationship between pupils’problem posing and problem solving abilities,their beliefs about problem posing and problem solving,and their general mathematics abilities,in a Chinese context.Five instruments,i.e.,a problem posing test,a problem solving test,a problem posing questionnaire,a problem solving questionnaire,and a standard achievement test,were administered to 69 Chinese fifth-grade pupils to assess these five variables and analyze their mutual relationships.Results revealed strong correlations between pupils’problem posing and problem solving abilities and beliefs,and their general mathematical abilities.展开更多
This work presents UNO,a unified monocular visual odometry framework that enables robust and adaptable pose estimation across diverse environments,platforms and motion patterns.Unlike traditional methods that rely on ...This work presents UNO,a unified monocular visual odometry framework that enables robust and adaptable pose estimation across diverse environments,platforms and motion patterns.Unlike traditional methods that rely on deploymentspecific tuning or predefined motion priors,our approach generalises effectively across a wide range of real-world scenarios,including autonomous vehicles,aerial drones,mobile robots and handheld devices.To this end,we introduce a mixture-of-experts strategy for local state estimation,with several specialised decoders that each handle a distinct class of ego-motion patterns.Moreover,we introduce a fully differentiable Gumbel-softmax module that constructs a robust interframe correlation graph,selects the optimal expert decoder and prunes erroneous estimates.These cues are then fed into a unified back-end that combines pretrained scale-independent depth priors with a lightweight bundling adjustment to enforce geometric consistency.We extensively evaluate our method on three major benchmark datasets:KITTI(outdoor/autonomous driving),EuRoC-MAV(indoor/aerial drones)and TUM-RGBD(indoor/handheld),demonstrating state-of-theart performance.展开更多
The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly fa...The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.展开更多
Real-time multi-person pose estimation(MPE)built upon neural network architectures aims to simultaneously detect multiple human instances and regress joint coordinates in dynamic scenes.However,due to factors such as ...Real-time multi-person pose estimation(MPE)built upon neural network architectures aims to simultaneously detect multiple human instances and regress joint coordinates in dynamic scenes.However,due to factors such as high model complexity and limited expression of keypoint information,both the efficiency and accuracy of real-time MPE remain to be improved.To mitigate the adverse impacts caused by the aforementioned issues,this work develops FSEM-Pose,a real-time MPE model rooted in the YOLOv10 framework.In detail,first,FSEM-Pose upgrades the backbone module of the baseline network by introducing the Feature Shuffling-Convolution(FS-Conv),which effectively reduces the backbone size while maximizing the retention of spatial information from the input image.Second,FSEM-Pose incorporates a Feature Saliency Enhancement Module(FSEM)to strengthen the feature encoding of human keypoints,thereby improving the accuracy of pose estimation.Finally,FSEM-Pose further enhances inference efficiency via a lightweight optimization of the head using shared convolutional layers.Our method achieves competitive results across multiple accuracy and efficiency metrics on the MS COCO 2017 and CrowdPose datasets.While being lightweight in design,it improves average precision(AP)by 2.1%and 2.5%,respectively.展开更多
Pull-ups are a very common fitness exercise that can be seen in many gyms.For athletes,it is very important to perform pull-ups correctly and scientifically.The pull-up scoring method designed in this paper can score ...Pull-ups are a very common fitness exercise that can be seen in many gyms.For athletes,it is very important to perform pull-ups correctly and scientifically.The pull-up scoring method designed in this paper can score the quality of pull-up movement scientifically and objectively,and provide guidance to help athletes better complete the pull-up movement.In this method,the OpenPose algorithm is used to identify the coordinates of skeleton points,and then the coordinate data are processed by a Kalman filter to obtain coordinates closer to the true values.Finally,the filtered data are input into the scoring algorithm designed based on the fuzzy comprehensive evaluation algorithm,and the results of the pull-up quality score and the corresponding guidance are obtained.展开更多
During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm ...During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.展开更多
AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigat...AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.展开更多
Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To addre...Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.展开更多
Human pose estimation is crucial across diverse applications,from healthcare to human-computer interaction.Integrating inertial measurement units(IMUs)with monocular vision methods holds great potential for leveraging...Human pose estimation is crucial across diverse applications,from healthcare to human-computer interaction.Integrating inertial measurement units(IMUs)with monocular vision methods holds great potential for leveraging complementary modalities;however,existing approaches are often limited by IMU drift,noise,and underutilization of visual information.To address these limitations,we propose a novel dual-stream feature extraction framework that effectively combines temporal IMU data and single-view image features for improved pose estimation.Short-term dependencies in IMU sequences are captured with convolutional layers,while a Transformerbased architecture models long-range temporal dynamics.To mitigate IMU drift and inter-sensor inconsistencies,a complementary filtering module is introduced alongside a cross-channel interaction mechanism.Features from the IMU and image streams are then fused via a dedicated fusion module and further refined utilizing a high-precision regression head for accurate pose prediction.Experimental results on benchmark datasets demonstrate that our method significantly outperforms existing techniques in terms of estimation,accuracy,and robustness,validating the effectiveness of our dual-stream architecture.展开更多
Camera Pose Estimating from point and line correspondences is critical in various applications,including robotics,augmented reality,3D reconstruction,and autonomous navigation.Existing methods,such as the Perspective-...Camera Pose Estimating from point and line correspondences is critical in various applications,including robotics,augmented reality,3D reconstruction,and autonomous navigation.Existing methods,such as the Perspective-n-Point(PnP)and Perspective-n-Line(PnL)approaches,offer limited accuracy and robustness in environments with occlusions,noise,or sparse feature data.This paper presents a unified solution,Efficient and Accurate Pose Estimation from Point and Line Correspondences(EAPnPL),combining point-based and linebased constraints to improve pose estimation accuracy and computational efficiency,particularly in low-altitude UAV navigation and obstacle avoidance.The proposed method utilizes quaternion parameterization of the rotation matrix to overcome singularity issues and address challenges in traditional rotation matrix-based formulations.A hybrid optimization framework is developed to integrate both point and line constraints,providing a more robust and stable solution in complex scenarios.The method is evaluated using synthetic and realworld datasets,demonstrating significant improvements in performance over existing techniques.The results indicate that the EAPnPL method enhances accuracy and reduces computational complexity,making it suitable for real-time applications in autonomous UAV systems.This approach offers a promising solution to the limitations of existing camera pose estimation methods,with potential applications in low-altitude navigation,autonomous robotics,and 3D scene reconstruction.展开更多
Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high c...Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high computational overhead.This study proposes a lightweight integrated framework for grasp detection and imitation learning,named GD-IL;it comprises a grasp detection algorithm based on manipulability and Gaussian mixture model(manipulability-GMM),and a grasp trajectory generation algorithm based on a two-stage robot imitation learning algorithm(TS-RIL).In the manipulability-GMM algorithm,we apply GMM clustering and ellipse regression to the object point cloud,propose two judgment criteria to generate multiple candidate grasp bounding boxes for the robot,and use manipulability as a metric for selecting the optimal grasp bounding box.The stages of the TS-RIL algorithm are grasp trajectory learning and robot pose optimization.In the first stage,the robot grasp trajectory is characterized using a second-order dynamic movement primitive model and Gaussian mixture regression(GMM).By adjusting the function form of the forcing term,the robot closely approximates the target-grasping trajectory.In the second stage,a robot pose optimization model is built based on the derived pose error formula and manipulability metric.This model allows the robot to adjust its configuration in real time while grasping,thereby effectively avoiding singularities.Finally,an algorithm verification platform is developed based on a Robot Operating System and a series of comparative experiments are conducted in real-world scenarios.The experimental results demonstrate that GD-IL significantly improves the effectiveness and robustness of grasp detection and trajectory imitation learning,outperforming existing state-of-the-art methods in execution efficiency,manipulability,and success rate.展开更多
Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton s...Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.展开更多
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su...Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.展开更多
With the development of computer vision technology,deep learning-based pose estimation and target detection have been widely used in the fields of human behavior analysis and intelligent security.However,owing to the ...With the development of computer vision technology,deep learning-based pose estimation and target detection have been widely used in the fields of human behavior analysis and intelligent security.However,owing to the complexity of animal poses and the diversity of species,the existing pose estimation methods still face many challenges when applied to animal targets.To solve this problem,an improved YOLO-Pose model is proposed to improve the accuracy and efficiency of animal pose estimation.On the basis of the original YOLO-Pose model,a separable kernel attention mechanism is introduced and improved to make it conform to the animal target,and combined with the spatial pyramid pool of YOLO-Pose,the multiscale feature fusion capability of the model is improved.The experimental results show that the improved YOLO-Pose model achieves excellent performance on both the public animal pose dataset and the AP-10K dataset,significantly improving the ability of target detection and pose estimation.展开更多
This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation ...This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation accuracy and robustness.We employ a manifold-based optimization approach,leveraging exponential and logarithmic mappings to transform rotation vectors into rotation matrices.The proposed ESKF framework ensures state variables remain near the origin,effectively mitigating singularity issues and enhancing numerical stability.Additionally,due to the small magnitude of state variables,second-order terms can be neglected,simplifying Jacobian matrix computation and improving computational efficiency.Furthermore,we introduce a novel Kalman filter gain computation strategy that dynamically adapts to low-dimensional and high-dimensional observation equations,enabling efficient processing across different sensor modalities.Specifically,for resource-constrained UAV platforms,this method significantly reduces computational cost,making it highly suitable for real-time UAV applications.展开更多
文摘贵刊1995年第12期 p.30《怎样理解这三个句子》一文,读后颇受启发。但该文有一处注释写道:“suppose,supposing 引导条件状语从句时,仅用于问句。”笔者认为,这一说法欠妥。请看以下例证:Suppose white were black,you might be right.假如白的即是黑的,那末你或许就对了。(《英汉大词典》下卷 p.3490)Suppose(Supposing)you miss your tiger,he is not likely to miss you.你如果打不着老虎,老虎不见得吃不着你。(《英华大词典》修订第二版 p.1399)
基金This research was supported by Grant Education Sciences Planning(JG10DB223)“Experimental research on the development of pupils’problem posing ability in Shenyang City”from the Research Fund of the Shenyang Educational Committeeby Grant GOA 2012/10“Number sense:Analysis and improvement”from the Research Fund of the Katholieke Universiteit Leuven,Belgium.
文摘The goal of the present study is to investigate the relationship between pupils’problem posing and problem solving abilities,their beliefs about problem posing and problem solving,and their general mathematics abilities,in a Chinese context.Five instruments,i.e.,a problem posing test,a problem solving test,a problem posing questionnaire,a problem solving questionnaire,and a standard achievement test,were administered to 69 Chinese fifth-grade pupils to assess these five variables and analyze their mutual relationships.Results revealed strong correlations between pupils’problem posing and problem solving abilities and beliefs,and their general mathematical abilities.
基金supported by the Technology Project Managed by the State Grid Corporation of China(Grant 5700-202416334A-2-1-ZX).
文摘This work presents UNO,a unified monocular visual odometry framework that enables robust and adaptable pose estimation across diverse environments,platforms and motion patterns.Unlike traditional methods that rely on deploymentspecific tuning or predefined motion priors,our approach generalises effectively across a wide range of real-world scenarios,including autonomous vehicles,aerial drones,mobile robots and handheld devices.To this end,we introduce a mixture-of-experts strategy for local state estimation,with several specialised decoders that each handle a distinct class of ego-motion patterns.Moreover,we introduce a fully differentiable Gumbel-softmax module that constructs a robust interframe correlation graph,selects the optimal expert decoder and prunes erroneous estimates.These cues are then fed into a unified back-end that combines pretrained scale-independent depth priors with a lightweight bundling adjustment to enforce geometric consistency.We extensively evaluate our method on three major benchmark datasets:KITTI(outdoor/autonomous driving),EuRoC-MAV(indoor/aerial drones)and TUM-RGBD(indoor/handheld),demonstrating state-of-theart performance.
基金supported by the Knowledge Innovation Program of Wuhan-Shuguang Project(Grant No.2023010201020443)the School-Level Scientific Research Project Funding Program of Jianghan University(Grant No.2022XKZX33)the Natural Science Foundation of Hubei Province(Grant No.2024AFB466).
文摘The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.
基金supported by the Talent Startup Program of Huangshan University under Grant No.2025xkjq003Additional partial funding was gratefully received from the Scientific Research Project of the Anhui Provincial Department of Education under Grant No.2025AHGXZK40303.
文摘Real-time multi-person pose estimation(MPE)built upon neural network architectures aims to simultaneously detect multiple human instances and regress joint coordinates in dynamic scenes.However,due to factors such as high model complexity and limited expression of keypoint information,both the efficiency and accuracy of real-time MPE remain to be improved.To mitigate the adverse impacts caused by the aforementioned issues,this work develops FSEM-Pose,a real-time MPE model rooted in the YOLOv10 framework.In detail,first,FSEM-Pose upgrades the backbone module of the baseline network by introducing the Feature Shuffling-Convolution(FS-Conv),which effectively reduces the backbone size while maximizing the retention of spatial information from the input image.Second,FSEM-Pose incorporates a Feature Saliency Enhancement Module(FSEM)to strengthen the feature encoding of human keypoints,thereby improving the accuracy of pose estimation.Finally,FSEM-Pose further enhances inference efficiency via a lightweight optimization of the head using shared convolutional layers.Our method achieves competitive results across multiple accuracy and efficiency metrics on the MS COCO 2017 and CrowdPose datasets.While being lightweight in design,it improves average precision(AP)by 2.1%and 2.5%,respectively.
文摘Pull-ups are a very common fitness exercise that can be seen in many gyms.For athletes,it is very important to perform pull-ups correctly and scientifically.The pull-up scoring method designed in this paper can score the quality of pull-up movement scientifically and objectively,and provide guidance to help athletes better complete the pull-up movement.In this method,the OpenPose algorithm is used to identify the coordinates of skeleton points,and then the coordinate data are processed by a Kalman filter to obtain coordinates closer to the true values.Finally,the filtered data are input into the scoring algorithm designed based on the fuzzy comprehensive evaluation algorithm,and the results of the pull-up quality score and the corresponding guidance are obtained.
基金National Key R&D Program of China(No.2019YFC1521300)。
文摘During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.
基金Supported by the Shenzhen Science and Technology Program(No.JCYJ20240813152704006)the National Natural Science Foundation of China(No.62401259)+2 种基金the Fundamental Research Funds for the Central Universities(No.NZ2024036)the Postdoctoral Fellowship Program of CPSF(No.GZC20242228)High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics。
文摘AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.
基金support provided by the European University of Atlantic.
文摘Human pose estimation is crucial across diverse applications,from healthcare to human-computer interaction.Integrating inertial measurement units(IMUs)with monocular vision methods holds great potential for leveraging complementary modalities;however,existing approaches are often limited by IMU drift,noise,and underutilization of visual information.To address these limitations,we propose a novel dual-stream feature extraction framework that effectively combines temporal IMU data and single-view image features for improved pose estimation.Short-term dependencies in IMU sequences are captured with convolutional layers,while a Transformerbased architecture models long-range temporal dynamics.To mitigate IMU drift and inter-sensor inconsistencies,a complementary filtering module is introduced alongside a cross-channel interaction mechanism.Features from the IMU and image streams are then fused via a dedicated fusion module and further refined utilizing a high-precision regression head for accurate pose prediction.Experimental results on benchmark datasets demonstrate that our method significantly outperforms existing techniques in terms of estimation,accuracy,and robustness,validating the effectiveness of our dual-stream architecture.
基金funded by the Jiangsu Province Postgraduate Scientific Research and Practice Innovation Program(SJCX240449)projectthe Nanjing University of Information Science and Technology Talent Startup Fund(2022r078).
文摘Camera Pose Estimating from point and line correspondences is critical in various applications,including robotics,augmented reality,3D reconstruction,and autonomous navigation.Existing methods,such as the Perspective-n-Point(PnP)and Perspective-n-Line(PnL)approaches,offer limited accuracy and robustness in environments with occlusions,noise,or sparse feature data.This paper presents a unified solution,Efficient and Accurate Pose Estimation from Point and Line Correspondences(EAPnPL),combining point-based and linebased constraints to improve pose estimation accuracy and computational efficiency,particularly in low-altitude UAV navigation and obstacle avoidance.The proposed method utilizes quaternion parameterization of the rotation matrix to overcome singularity issues and address challenges in traditional rotation matrix-based formulations.A hybrid optimization framework is developed to integrate both point and line constraints,providing a more robust and stable solution in complex scenarios.The method is evaluated using synthetic and realworld datasets,demonstrating significant improvements in performance over existing techniques.The results indicate that the EAPnPL method enhances accuracy and reduces computational complexity,making it suitable for real-time applications in autonomous UAV systems.This approach offers a promising solution to the limitations of existing camera pose estimation methods,with potential applications in low-altitude navigation,autonomous robotics,and 3D scene reconstruction.
基金Supported by National Natural Science Foundation of China(Grant No.52475280)Shaanxi Provincial Natural Science Basic Research Program(Grant No.2025SYSSYSZD-105).
文摘Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high computational overhead.This study proposes a lightweight integrated framework for grasp detection and imitation learning,named GD-IL;it comprises a grasp detection algorithm based on manipulability and Gaussian mixture model(manipulability-GMM),and a grasp trajectory generation algorithm based on a two-stage robot imitation learning algorithm(TS-RIL).In the manipulability-GMM algorithm,we apply GMM clustering and ellipse regression to the object point cloud,propose two judgment criteria to generate multiple candidate grasp bounding boxes for the robot,and use manipulability as a metric for selecting the optimal grasp bounding box.The stages of the TS-RIL algorithm are grasp trajectory learning and robot pose optimization.In the first stage,the robot grasp trajectory is characterized using a second-order dynamic movement primitive model and Gaussian mixture regression(GMM).By adjusting the function form of the forcing term,the robot closely approximates the target-grasping trajectory.In the second stage,a robot pose optimization model is built based on the derived pose error formula and manipulability metric.This model allows the robot to adjust its configuration in real time while grasping,thereby effectively avoiding singularities.Finally,an algorithm verification platform is developed based on a Robot Operating System and a series of comparative experiments are conducted in real-world scenarios.The experimental results demonstrate that GD-IL significantly improves the effectiveness and robustness of grasp detection and trajectory imitation learning,outperforming existing state-of-the-art methods in execution efficiency,manipulability,and success rate.
基金supported in part by the National Natural Science Foundation of China under Grants 61973065,U20A20197,61973063.
文摘Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.
基金supported in part by the National Natural Science Foundation of China under Grants 62071345。
文摘Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.
基金funded by the second batch of Tianchi Talents(Leading Tal-ents)project in Xinjiang Uygur Autonomous Region.Project leader:Lei Liu from School of Computer Science and Technology,Xinjiang University.
文摘With the development of computer vision technology,deep learning-based pose estimation and target detection have been widely used in the fields of human behavior analysis and intelligent security.However,owing to the complexity of animal poses and the diversity of species,the existing pose estimation methods still face many challenges when applied to animal targets.To solve this problem,an improved YOLO-Pose model is proposed to improve the accuracy and efficiency of animal pose estimation.On the basis of the original YOLO-Pose model,a separable kernel attention mechanism is introduced and improved to make it conform to the animal target,and combined with the spatial pyramid pool of YOLO-Pose,the multiscale feature fusion capability of the model is improved.The experimental results show that the improved YOLO-Pose model achieves excellent performance on both the public animal pose dataset and the AP-10K dataset,significantly improving the ability of target detection and pose estimation.
基金National Natural Science Foundation of China(Grant No.62266045)National Science and Technology Major Project of China(No.2022YFE0138600)。
文摘This paper presents a manifold-optimized Error-State Kalman Filter(ESKF)framework for unmanned aerial vehicle(UAV)pose estimation,integrating Inertial Measurement Unit(IMU)data with GPS or LiDAR to enhance estimation accuracy and robustness.We employ a manifold-based optimization approach,leveraging exponential and logarithmic mappings to transform rotation vectors into rotation matrices.The proposed ESKF framework ensures state variables remain near the origin,effectively mitigating singularity issues and enhancing numerical stability.Additionally,due to the small magnitude of state variables,second-order terms can be neglected,simplifying Jacobian matrix computation and improving computational efficiency.Furthermore,we introduce a novel Kalman filter gain computation strategy that dynamically adapts to low-dimensional and high-dimensional observation equations,enabling efficient processing across different sensor modalities.Specifically,for resource-constrained UAV platforms,this method significantly reduces computational cost,making it highly suitable for real-time UAV applications.