Objective:To evaluate the contribution of poses screen pre-impregnated(PSP) installed at openings and eaves of dwellings in the reduction of malaria transmission in the commune of Aguegues in Benin.Methods:The PSP wer...Objective:To evaluate the contribution of poses screen pre-impregnated(PSP) installed at openings and eaves of dwellings in the reduction of malaria transmission in the commune of Aguegues in Benin.Methods:The PSP were manufactured from preimpregnated Olyset Net.They were installed at windows,eaves and doors of 70 dwellings.320 children aged 6-59 months were treated and 311 children were recruited in the control zone.Variables measured are:plasmodic index(IP),gametoeyte index,parasite density(PD),fever,hemoglobin,anemia. Results:The global IP was 16.62%with PSP and 72.20%without PSP.Gametoeyte index did not differ significantly between the treated zone(27.8) and the control zone(29.1).The total geometric mean of DP was 309 in the treated zone and 600 in the control zone.Hemoglobin level is 8.7 in the control zone and 9.5 in the treated zone.We noted a predominance of anemia in the control zone compared to the treated zone.Conclusions:The PSP have contributed to a significant reduction in morbidity in the commune of Aguegues.展开更多
The death of Muammar Gaddafi marks a new era for Libya.It also poses a huge challenge for Libyan authorities dealing with tribal conflicts.He Wenping, a researcher with the Institute of West-Asian and African Studies ...The death of Muammar Gaddafi marks a new era for Libya.It also poses a huge challenge for Libyan authorities dealing with tribal conflicts.He Wenping, a researcher with the Institute of West-Asian and African Studies at the Chinese Academy of Social Sciences, believes that Libya is in danger of falling into a period of internal strife and tribal conflict.Her thoughts are as follows:展开更多
With a thermal manikin, the effects of dressing poses on clothing thermal insulation are studied. It is found that the thermal insulation of still air layer over human body has not been influenced by the dressing pose...With a thermal manikin, the effects of dressing poses on clothing thermal insulation are studied. It is found that the thermal insulation of still air layer over human body has not been influenced by the dressing poses, but the dressing poses have effects on the thermal insulation of clothing system.展开更多
Lots of progress has been made recently on 2 D human pose tracking with tracking-by-detection approaches. However,several challenges still remain in this area which is due to self-occlusions and the confusion between ...Lots of progress has been made recently on 2 D human pose tracking with tracking-by-detection approaches. However,several challenges still remain in this area which is due to self-occlusions and the confusion between the left and right limbs during tracking. In this work,a head orientation detection step is introduced into the tracking framework to serve as a complementary tool to assist human pose estimation. With the face orientation determined,the system can decide whether the left or right side of the human body is exactly visible and infer the state of the symmetric counterpart. By granting a higher priority for the completely visible side,the system can avoid double counting to a great extent when inferring body poses. The proposed framework is evaluated on the HumanEva dataset. The results show that it largely reduces the occurrence of double counting and distinguishes the left and right sides consistently.展开更多
Aimed at the hydrodynamic response for marine structures slamming into water, based on the mechanism analysis to the slamming process, and by combining 3D N-S equation and k-ε turbulent kinetic equation with structur...Aimed at the hydrodynamic response for marine structures slamming into water, based on the mechanism analysis to the slamming process, and by combining 3D N-S equation and k-ε turbulent kinetic equation with structure fully 6DOF motion equation, a mathematical model for the wind-fluid-solid interaction is established in 3D marine structure slamming wave at free poses and wind-wave-flow complex environments. Compared with the results of physical model test, the numerical results from the slamming wave well correspond with the experimental results. Through the mathematical model, the wave-making issue of 3D marine structure at initial pose falls into water in different complex wind, wave and flow environments is investigated. The research results show that various kinds of natural factors and structure initial poses have different influence on the slamming wave, and there is an obvious rule in this process.展开更多
Facial expression recognition(FER)has numerous applications in computer security,neuroscience,psychology,and engineering.Owing to its non-intrusiveness,it is considered a useful technology for combating crime.However,...Facial expression recognition(FER)has numerous applications in computer security,neuroscience,psychology,and engineering.Owing to its non-intrusiveness,it is considered a useful technology for combating crime.However,FER is plagued with several challenges,the most serious of which is its poor prediction accuracy in severe head poses.The aim of this study,therefore,is to improve the recognition accuracy in severe head poses by proposing a robust 3D head-tracking algorithm based on an ellipsoidal model,advanced ensemble of AdaBoost,and saturated vector machine(SVM).The FER features are tracked from one frame to the next using the ellipsoidal tracking model,and the visible expressive facial key points are extracted using Gabor filters.The ensemble algorithm(Ada-AdaSVM)is then used for feature selection and classification.The proposed technique is evaluated using the Bosphorus,BU-3DFE,MMI,CK^(+),and BP4D-Spontaneous facial expression databases.The overall performance is outstanding.展开更多
benefits from positional diagrams showing where the players are.These diagrams often show the layout of the players through simple symbols,which provide no information about their poses.This paper investigates if the ...benefits from positional diagrams showing where the players are.These diagrams often show the layout of the players through simple symbols,which provide no information about their poses.This paper investigates if the visualization of player poses is beneficial for tactical understanding of positional diagrams in padel.We propose a realistic,cartoon-like representation of the players and discuss its integration into a typical positional diagram.To overcome the cost of generating player representations depicting their pose,we propose a method to generate such representations from minimal user input.We conducted a user study to evaluate the effectiveness of our pose-aware diagrams.The tasks for the study were designed to encompass the main in-game scenarios in padel,which include the ballholder at the net with opponents defending,the reverse situation,and transitions between these two states.We found that our representation is preferred over a symbolic one that only indicates player orientation.The proposed method enables coaches to produce such representations within a matter of seconds,thereby significantly facilitating the creation of detailed and easily analyzable depictions of game situations.展开更多
Robust non-intrusive eye location plays an important role in vision-based man-mechine interaction. A modified Hausdorff distance based measure to localize the eyes is proposed, which could tolerate various changes in ...Robust non-intrusive eye location plays an important role in vision-based man-mechine interaction. A modified Hausdorff distance based measure to localize the eyes is proposed, which could tolerate various changes in eye pose, shape, and scale. To eliminate the effects of the illumination variations, an 8- neighbour-based transformation of the gray images is proposed. The transformed image is less sensitive to illumination changes while preserves the appearance information of eyes. All the localized candidates of eyes are identified by back-propagation neural networks. Experiments demonstrate that the robust method for eye location is able to localize eyes with different eye sizes, shapes, and poses under different illuminations.展开更多
During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm ...During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.展开更多
AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigat...AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.展开更多
Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To addre...Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.展开更多
Forecasting 3-dimensional skeleton-based human poses from the historical sequence is a classic task,which shows enormous potential in robotics,computer vision,and graphics.Currently,the state-of-theart methods resort ...Forecasting 3-dimensional skeleton-based human poses from the historical sequence is a classic task,which shows enormous potential in robotics,computer vision,and graphics.Currently,the state-of-theart methods resort to graph convolutional networks(GCNs)to access the relationships of human joint pairs to formulate this problem.However,human action involves complex interactions among multiple joints,which presents a higher-order correlation overstepping the pairwise(2-order)connection of GCNs.Moreover,joints are typically activated by the parent joint,rather than driving their parent joints,whereas in existing methods,this specific direction of information transmission is ignored.In this work,we propose a novel hybrid directed hypergraph convolution network(H-DHGCN)to model the high-order relationships of the human skeleton with directionality.Specifically,our H-DHGCN mainly involves 2 core components.One is the static directed hypergraph,which is pre-defined according to the human body structure,to effectively leverage the natural relations of human joints.The second is dynamic directed hypergraph(D-DHG).D-DHG is learnable and can be constructed adaptively,to learn the unique characteristics of the motion sequence.In contrast to the typical GCNs,our method brings a richer and more refined topological representation of skeleton data.On several large-scale benchmarks,experimental results show that the proposed model consistently surpasses the latest techniques.展开更多
Aortic regurgitation(AR)poses distinct challenges in interventional cardiology,necessitating novel approaches for treatment.This editorial examined the evolving landscape of transcatheter aortic valve replacement(TAVR...Aortic regurgitation(AR)poses distinct challenges in interventional cardiology,necessitating novel approaches for treatment.This editorial examined the evolving landscape of transcatheter aortic valve replacement(TAVR)as an alternative therapeutic strategy for AR,particularly in patients deemed high risk for surgery.We explored the anatomical and patho-physiological disparities between AR and aortic stenosis(AS)and elucidates the technical nuances of TAVR procedures in AR pa-tients,emphasizing the need for precise prosthesis positioning and considerations for excessive stroke volume.Additionally,we discussed the safety and efficacy of TAVR compared to SAVR in AR management,drawing insights from recent case series and registry data.Notably,dedicated TAVR devices tailored for AR,such as the J-Valve and JenaValve,demonstrate promising out-comes in reducing residual AR and ensuring procedural success.Conversely,“off-label”TAVR devices,including balloon-ex-pandable and self-expandable platforms,offer feasible alternatives-particularly for large aortic annuli-with favorable device suc-cess rates and low residual AR rates.We highlighted the need for further research,including randomized trials,to delineate the definitive role of TAVR in AR treatment and to address remaining questions regarding device selection and long-term outcomes.In conclusion,TAVR emerges as a viable option for patients with AR,particularly those facing high surgical risks or frailty,with ongoing investigations poised to refine its position in the therapeutic armamentarium.展开更多
Camera Pose Estimating from point and line correspondences is critical in various applications,including robotics,augmented reality,3D reconstruction,and autonomous navigation.Existing methods,such as the Perspective-...Camera Pose Estimating from point and line correspondences is critical in various applications,including robotics,augmented reality,3D reconstruction,and autonomous navigation.Existing methods,such as the Perspective-n-Point(PnP)and Perspective-n-Line(PnL)approaches,offer limited accuracy and robustness in environments with occlusions,noise,or sparse feature data.This paper presents a unified solution,Efficient and Accurate Pose Estimation from Point and Line Correspondences(EAPnPL),combining point-based and linebased constraints to improve pose estimation accuracy and computational efficiency,particularly in low-altitude UAV navigation and obstacle avoidance.The proposed method utilizes quaternion parameterization of the rotation matrix to overcome singularity issues and address challenges in traditional rotation matrix-based formulations.A hybrid optimization framework is developed to integrate both point and line constraints,providing a more robust and stable solution in complex scenarios.The method is evaluated using synthetic and realworld datasets,demonstrating significant improvements in performance over existing techniques.The results indicate that the EAPnPL method enhances accuracy and reduces computational complexity,making it suitable for real-time applications in autonomous UAV systems.This approach offers a promising solution to the limitations of existing camera pose estimation methods,with potential applications in low-altitude navigation,autonomous robotics,and 3D scene reconstruction.展开更多
Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high c...Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high computational overhead.This study proposes a lightweight integrated framework for grasp detection and imitation learning,named GD-IL;it comprises a grasp detection algorithm based on manipulability and Gaussian mixture model(manipulability-GMM),and a grasp trajectory generation algorithm based on a two-stage robot imitation learning algorithm(TS-RIL).In the manipulability-GMM algorithm,we apply GMM clustering and ellipse regression to the object point cloud,propose two judgment criteria to generate multiple candidate grasp bounding boxes for the robot,and use manipulability as a metric for selecting the optimal grasp bounding box.The stages of the TS-RIL algorithm are grasp trajectory learning and robot pose optimization.In the first stage,the robot grasp trajectory is characterized using a second-order dynamic movement primitive model and Gaussian mixture regression(GMM).By adjusting the function form of the forcing term,the robot closely approximates the target-grasping trajectory.In the second stage,a robot pose optimization model is built based on the derived pose error formula and manipulability metric.This model allows the robot to adjust its configuration in real time while grasping,thereby effectively avoiding singularities.Finally,an algorithm verification platform is developed based on a Robot Operating System and a series of comparative experiments are conducted in real-world scenarios.The experimental results demonstrate that GD-IL significantly improves the effectiveness and robustness of grasp detection and trajectory imitation learning,outperforming existing state-of-the-art methods in execution efficiency,manipulability,and success rate.展开更多
Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton s...Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.展开更多
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su...Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.展开更多
基金supported by the Ministry of Higher Education and Scientific Research of the Government of Benin
文摘Objective:To evaluate the contribution of poses screen pre-impregnated(PSP) installed at openings and eaves of dwellings in the reduction of malaria transmission in the commune of Aguegues in Benin.Methods:The PSP were manufactured from preimpregnated Olyset Net.They were installed at windows,eaves and doors of 70 dwellings.320 children aged 6-59 months were treated and 311 children were recruited in the control zone.Variables measured are:plasmodic index(IP),gametoeyte index,parasite density(PD),fever,hemoglobin,anemia. Results:The global IP was 16.62%with PSP and 72.20%without PSP.Gametoeyte index did not differ significantly between the treated zone(27.8) and the control zone(29.1).The total geometric mean of DP was 309 in the treated zone and 600 in the control zone.Hemoglobin level is 8.7 in the control zone and 9.5 in the treated zone.We noted a predominance of anemia in the control zone compared to the treated zone.Conclusions:The PSP have contributed to a significant reduction in morbidity in the commune of Aguegues.
文摘The death of Muammar Gaddafi marks a new era for Libya.It also poses a huge challenge for Libyan authorities dealing with tribal conflicts.He Wenping, a researcher with the Institute of West-Asian and African Studies at the Chinese Academy of Social Sciences, believes that Libya is in danger of falling into a period of internal strife and tribal conflict.Her thoughts are as follows:
文摘With a thermal manikin, the effects of dressing poses on clothing thermal insulation are studied. It is found that the thermal insulation of still air layer over human body has not been influenced by the dressing poses, but the dressing poses have effects on the thermal insulation of clothing system.
文摘Lots of progress has been made recently on 2 D human pose tracking with tracking-by-detection approaches. However,several challenges still remain in this area which is due to self-occlusions and the confusion between the left and right limbs during tracking. In this work,a head orientation detection step is introduced into the tracking framework to serve as a complementary tool to assist human pose estimation. With the face orientation determined,the system can decide whether the left or right side of the human body is exactly visible and infer the state of the symmetric counterpart. By granting a higher priority for the completely visible side,the system can avoid double counting to a great extent when inferring body poses. The proposed framework is evaluated on the HumanEva dataset. The results show that it largely reduces the occurrence of double counting and distinguishes the left and right sides consistently.
文摘Aimed at the hydrodynamic response for marine structures slamming into water, based on the mechanism analysis to the slamming process, and by combining 3D N-S equation and k-ε turbulent kinetic equation with structure fully 6DOF motion equation, a mathematical model for the wind-fluid-solid interaction is established in 3D marine structure slamming wave at free poses and wind-wave-flow complex environments. Compared with the results of physical model test, the numerical results from the slamming wave well correspond with the experimental results. Through the mathematical model, the wave-making issue of 3D marine structure at initial pose falls into water in different complex wind, wave and flow environments is investigated. The research results show that various kinds of natural factors and structure initial poses have different influence on the slamming wave, and there is an obvious rule in this process.
文摘Facial expression recognition(FER)has numerous applications in computer security,neuroscience,psychology,and engineering.Owing to its non-intrusiveness,it is considered a useful technology for combating crime.However,FER is plagued with several challenges,the most serious of which is its poor prediction accuracy in severe head poses.The aim of this study,therefore,is to improve the recognition accuracy in severe head poses by proposing a robust 3D head-tracking algorithm based on an ellipsoidal model,advanced ensemble of AdaBoost,and saturated vector machine(SVM).The FER features are tracked from one frame to the next using the ellipsoidal tracking model,and the visible expressive facial key points are extracted using Gabor filters.The ensemble algorithm(Ada-AdaSVM)is then used for feature selection and classification.The proposed technique is evaluated using the Bosphorus,BU-3DFE,MMI,CK^(+),and BP4D-Spontaneous facial expression databases.The overall performance is outstanding.
基金supported by MICIU/AEI/10.13039/501100011033 and ERDF/EU under Grant PID2021-122136OB-C21by the Federal Ministry of Education and Research of Germany and the state of North-Rhine Westphalia as part of the Lamarr Institute for Machine Learning and Artificial Intelligence(Lamarr22B)+1 种基金supported by the Spanish Ministry of Science and Innovation and ERDF/EU under Grant PRE2018-086835funding from the Department of Research and Universities of the Government of Catalonia(2021 SGR 01035)。
文摘benefits from positional diagrams showing where the players are.These diagrams often show the layout of the players through simple symbols,which provide no information about their poses.This paper investigates if the visualization of player poses is beneficial for tactical understanding of positional diagrams in padel.We propose a realistic,cartoon-like representation of the players and discuss its integration into a typical positional diagram.To overcome the cost of generating player representations depicting their pose,we propose a method to generate such representations from minimal user input.We conducted a user study to evaluate the effectiveness of our pose-aware diagrams.The tasks for the study were designed to encompass the main in-game scenarios in padel,which include the ballholder at the net with opponents defending,the reverse situation,and transitions between these two states.We found that our representation is preferred over a symbolic one that only indicates player orientation.The proposed method enables coaches to produce such representations within a matter of seconds,thereby significantly facilitating the creation of detailed and easily analyzable depictions of game situations.
文摘Robust non-intrusive eye location plays an important role in vision-based man-mechine interaction. A modified Hausdorff distance based measure to localize the eyes is proposed, which could tolerate various changes in eye pose, shape, and scale. To eliminate the effects of the illumination variations, an 8- neighbour-based transformation of the gray images is proposed. The transformed image is less sensitive to illumination changes while preserves the appearance information of eyes. All the localized candidates of eyes are identified by back-propagation neural networks. Experiments demonstrate that the robust method for eye location is able to localize eyes with different eye sizes, shapes, and poses under different illuminations.
基金National Key R&D Program of China(No.2019YFC1521300)。
文摘During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.
基金Supported by the Shenzhen Science and Technology Program(No.JCYJ20240813152704006)the National Natural Science Foundation of China(No.62401259)+2 种基金the Fundamental Research Funds for the Central Universities(No.NZ2024036)the Postdoctoral Fellowship Program of CPSF(No.GZC20242228)High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics。
文摘AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.
基金supported in part by the National Natural Science Foundation of China(62306141)in part by the Jiangsu Funding Program for Excellent Postdoctoral Talent(2022ZB269)+2 种基金in part by the Natural Science Foundation of Jiangsu Province(BK20220939)in part by the China Postdoctoral Science Foundation(2022M721629)in part by Research Project of University Natural Science Fund of Jiangsu Province(22KJB520002).
文摘Forecasting 3-dimensional skeleton-based human poses from the historical sequence is a classic task,which shows enormous potential in robotics,computer vision,and graphics.Currently,the state-of-theart methods resort to graph convolutional networks(GCNs)to access the relationships of human joint pairs to formulate this problem.However,human action involves complex interactions among multiple joints,which presents a higher-order correlation overstepping the pairwise(2-order)connection of GCNs.Moreover,joints are typically activated by the parent joint,rather than driving their parent joints,whereas in existing methods,this specific direction of information transmission is ignored.In this work,we propose a novel hybrid directed hypergraph convolution network(H-DHGCN)to model the high-order relationships of the human skeleton with directionality.Specifically,our H-DHGCN mainly involves 2 core components.One is the static directed hypergraph,which is pre-defined according to the human body structure,to effectively leverage the natural relations of human joints.The second is dynamic directed hypergraph(D-DHG).D-DHG is learnable and can be constructed adaptively,to learn the unique characteristics of the motion sequence.In contrast to the typical GCNs,our method brings a richer and more refined topological representation of skeleton data.On several large-scale benchmarks,experimental results show that the proposed model consistently surpasses the latest techniques.
文摘Aortic regurgitation(AR)poses distinct challenges in interventional cardiology,necessitating novel approaches for treatment.This editorial examined the evolving landscape of transcatheter aortic valve replacement(TAVR)as an alternative therapeutic strategy for AR,particularly in patients deemed high risk for surgery.We explored the anatomical and patho-physiological disparities between AR and aortic stenosis(AS)and elucidates the technical nuances of TAVR procedures in AR pa-tients,emphasizing the need for precise prosthesis positioning and considerations for excessive stroke volume.Additionally,we discussed the safety and efficacy of TAVR compared to SAVR in AR management,drawing insights from recent case series and registry data.Notably,dedicated TAVR devices tailored for AR,such as the J-Valve and JenaValve,demonstrate promising out-comes in reducing residual AR and ensuring procedural success.Conversely,“off-label”TAVR devices,including balloon-ex-pandable and self-expandable platforms,offer feasible alternatives-particularly for large aortic annuli-with favorable device suc-cess rates and low residual AR rates.We highlighted the need for further research,including randomized trials,to delineate the definitive role of TAVR in AR treatment and to address remaining questions regarding device selection and long-term outcomes.In conclusion,TAVR emerges as a viable option for patients with AR,particularly those facing high surgical risks or frailty,with ongoing investigations poised to refine its position in the therapeutic armamentarium.
基金funded by the Jiangsu Province Postgraduate Scientific Research and Practice Innovation Program(SJCX240449)projectthe Nanjing University of Information Science and Technology Talent Startup Fund(2022r078).
文摘Camera Pose Estimating from point and line correspondences is critical in various applications,including robotics,augmented reality,3D reconstruction,and autonomous navigation.Existing methods,such as the Perspective-n-Point(PnP)and Perspective-n-Line(PnL)approaches,offer limited accuracy and robustness in environments with occlusions,noise,or sparse feature data.This paper presents a unified solution,Efficient and Accurate Pose Estimation from Point and Line Correspondences(EAPnPL),combining point-based and linebased constraints to improve pose estimation accuracy and computational efficiency,particularly in low-altitude UAV navigation and obstacle avoidance.The proposed method utilizes quaternion parameterization of the rotation matrix to overcome singularity issues and address challenges in traditional rotation matrix-based formulations.A hybrid optimization framework is developed to integrate both point and line constraints,providing a more robust and stable solution in complex scenarios.The method is evaluated using synthetic and realworld datasets,demonstrating significant improvements in performance over existing techniques.The results indicate that the EAPnPL method enhances accuracy and reduces computational complexity,making it suitable for real-time applications in autonomous UAV systems.This approach offers a promising solution to the limitations of existing camera pose estimation methods,with potential applications in low-altitude navigation,autonomous robotics,and 3D scene reconstruction.
基金Supported by National Natural Science Foundation of China(Grant No.52475280)Shaanxi Provincial Natural Science Basic Research Program(Grant No.2025SYSSYSZD-105).
文摘Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high computational overhead.This study proposes a lightweight integrated framework for grasp detection and imitation learning,named GD-IL;it comprises a grasp detection algorithm based on manipulability and Gaussian mixture model(manipulability-GMM),and a grasp trajectory generation algorithm based on a two-stage robot imitation learning algorithm(TS-RIL).In the manipulability-GMM algorithm,we apply GMM clustering and ellipse regression to the object point cloud,propose two judgment criteria to generate multiple candidate grasp bounding boxes for the robot,and use manipulability as a metric for selecting the optimal grasp bounding box.The stages of the TS-RIL algorithm are grasp trajectory learning and robot pose optimization.In the first stage,the robot grasp trajectory is characterized using a second-order dynamic movement primitive model and Gaussian mixture regression(GMM).By adjusting the function form of the forcing term,the robot closely approximates the target-grasping trajectory.In the second stage,a robot pose optimization model is built based on the derived pose error formula and manipulability metric.This model allows the robot to adjust its configuration in real time while grasping,thereby effectively avoiding singularities.Finally,an algorithm verification platform is developed based on a Robot Operating System and a series of comparative experiments are conducted in real-world scenarios.The experimental results demonstrate that GD-IL significantly improves the effectiveness and robustness of grasp detection and trajectory imitation learning,outperforming existing state-of-the-art methods in execution efficiency,manipulability,and success rate.
基金supported in part by the National Natural Science Foundation of China under Grants 61973065,U20A20197,61973063.
文摘Previous multi-view 3D human pose estimation methods neither correlate different human joints in each view nor model learnable correlations between the same joints in different views explicitly,meaning that skeleton structure information is not utilized and multi-view pose information is not completely fused.Moreover,existing graph convolutional operations do not consider the specificity of different joints and different views of pose information when processing skeleton graphs,making the correlation weights between nodes in the graph and their neighborhood nodes shared.Existing Graph Convolutional Networks(GCNs)cannot extract global and deeplevel skeleton structure information and view correlations efficiently.To solve these problems,pre-estimated multiview 2D poses are designed as a multi-view skeleton graph to fuse skeleton priors and view correlations explicitly to process occlusion problem,with the skeleton-edge and symmetry-edge representing the structure correlations between adjacent joints in each viewof skeleton graph and the view-edge representing the view correlations between the same joints in different views.To make graph convolution operation mine elaborate and sufficient skeleton structure information and view correlations,different correlation weights are assigned to different categories of neighborhood nodes and further assigned to each node in the graph.Based on the graph convolution operation proposed above,a Residual Graph Convolution(RGC)module is designed as the basic module to be combined with the simplified Hourglass architecture to construct the Hourglass-GCN as our 3D pose estimation network.Hourglass-GCNwith a symmetrical and concise architecture processes three scales ofmulti-viewskeleton graphs to extract local-to-global scale and shallow-to-deep level skeleton features efficiently.Experimental results on common large 3D pose dataset Human3.6M and MPI-INF-3DHP show that Hourglass-GCN outperforms some excellent methods in 3D pose estimation accuracy.
基金supported in part by the National Natural Science Foundation of China under Grants 62071345。
文摘Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.