Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in...Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.展开更多
Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onbo...Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.展开更多
BACKGROUND Coronary computed tomography angiography(CCTA)is essential for diagnosing coronary artery disease as it provides detailed images of the heart’s blood vessels to identify blockages or abnormalities.Traditio...BACKGROUND Coronary computed tomography angiography(CCTA)is essential for diagnosing coronary artery disease as it provides detailed images of the heart’s blood vessels to identify blockages or abnormalities.Traditionally,determining the computed tomography(CT)scanning range has relied on manual methods due to limited automation in this area.AIM To develop and evaluate a novel deep learning approach to automate the determ-ination of CCTA scan ranges using anteroposterior scout images.METHODS A retrospective analysis was conducted on chest CT data from 1388 patients at the Radiology Department of the First Affiliated Hospital of a university-affiliated hospital,collected between February 27 and March 27,2024.A deep learning model was trained on anteroposterior scout images with annotations based on CCTA standards.The dataset was split into training(672 cases),validation(167 cases),and test(167 cases)sets to ensure robust model evaluation.RESULTS The study demonstrated exceptional performance on the test set,achieving a mean average precision(mAP50)of 0.995 and mAP50-95 of 0.994 for determining CCTA scan ranges.CONCLUSION This study demonstrates that:(1)Anteroposterior scout images can effectively estimate CCTA scan ranges;and(2)Estimates can be dynamically adjusted to meet the needs of various medical institutions.展开更多
This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such...This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.展开更多
Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,huma...Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.展开更多
Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we...Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.展开更多
This paper systematically studies the movement behavior changes of Camponotus japonicus under one or two leg injuries.Firstly,a linear motion channel matching the size of the ants'legs was designed,and the movemen...This paper systematically studies the movement behavior changes of Camponotus japonicus under one or two leg injuries.Firstly,a linear motion channel matching the size of the ants'legs was designed,and the movements of ants with different leg injuries were captured using high-speed cameras,constructing a comprehensive video dataset of ants'movements with missing legs.Secondly,stable and reliable motion position information for keypoints on the ants'bodies and legs was obtained by utilizing low-annotation biometric keypoint detection technology.Finally,by analyzing the filtered gait data,information about the changes in the step locational points areas,phase differences,and duty factors of the injured ants'remaining legs was obtained.Comparative analysis of the ants'gait characteristics revealed some common adjustment patterns when the ants were in the injured states.Additionally,the study found that the loss of a foreleg had a significant impact on the ants'movement.When two legs were missing,the loss of both legs on the same side had a greater effect on movement,whereas symmetric opposite-side leg loss conditions had a lesser impact.The research will provide an important reference for the subsequent design of gait adjustment algorithms for biomimetic multi-legged robots under damaged conditions.展开更多
In greenhouse environments,using automated machines for tomato harvesting to reduce labor consumption is a future development trend.Accurate and effective visual recognition is essential to accomplish harvesting tasks...In greenhouse environments,using automated machines for tomato harvesting to reduce labor consumption is a future development trend.Accurate and effective visual recognition is essential to accomplish harvesting tasks.However,most current studies use various models to gain harvesting information in multiple steps,resulting in heavy calculation costs,poor real-time availability,and weak recognition precision.In this study,an improved YOLOv8np-RCW end-to-end model based on YOLOv8n pose is proposed to simultaneously detect tomato bunches,maturity,and keypoints using a decoupled-head structure.The model integrates a ResNet-enhanced RepVGG architecture for a balance of accuracy and speed,employs the CARAFE upsampling algorithm for a larger receptive field with lightweight design,and optimizes the loss function with WIoU loss to enhance bounding box prediction,maturity detection,and keypoint extraction.Experimental results indicate that mAP50 of YOLOv8np-RCW model for bounding box and keypoints is 87.3%and 86.8%respectively,which is 6.2%and 5.5%higher than YOLOv8n pose model.Completing the tasks of bunch detection,maturity assessment,and keypoint localization requires only 9.8 ms.Euclidean distance error is less than 20 pixels in detecting keypoints.Based on this model,a method is proposed to quickly determine the orientation of tomato bunches using geometric cross-product and cross-multiplication calculations from keypoint 2D information,providing guidance for the motion planning of the end-effector.In field experiments,the robot achieved a harvesting success rate of 68%,with an average time of 10.8366 seconds per tomato bunch.展开更多
基金This work was supported by the National Natural Science Foundation of China(61871046,SM,http://www.nsfc.gov.cn/).
文摘Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.
基金co-supported by the National Natural Science Foundation of China(Nos.12302252 and 12472189)the Research Program of National University of Defense Technology,China(No.ZK24-31).
文摘Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.
基金Supported by Anhui Provincial College Students’Innovation and Entrepreneurship Training Program,No.S202310367063.
文摘BACKGROUND Coronary computed tomography angiography(CCTA)is essential for diagnosing coronary artery disease as it provides detailed images of the heart’s blood vessels to identify blockages or abnormalities.Traditionally,determining the computed tomography(CT)scanning range has relied on manual methods due to limited automation in this area.AIM To develop and evaluate a novel deep learning approach to automate the determ-ination of CCTA scan ranges using anteroposterior scout images.METHODS A retrospective analysis was conducted on chest CT data from 1388 patients at the Radiology Department of the First Affiliated Hospital of a university-affiliated hospital,collected between February 27 and March 27,2024.A deep learning model was trained on anteroposterior scout images with annotations based on CCTA standards.The dataset was split into training(672 cases),validation(167 cases),and test(167 cases)sets to ensure robust model evaluation.RESULTS The study demonstrated exceptional performance on the test set,achieving a mean average precision(mAP50)of 0.995 and mAP50-95 of 0.994 for determining CCTA scan ranges.CONCLUSION This study demonstrates that:(1)Anteroposterior scout images can effectively estimate CCTA scan ranges;and(2)Estimates can be dynamically adjusted to meet the needs of various medical institutions.
文摘This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.
基金the National Natural Science Foundation of China(Grant Number 62076246).
文摘Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.
基金supported by the Natural Science Foundation of Hubei Province of China under grant number 2022CFB536the National Natural Science Foundation of China under grant number 62367006the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology under grant number CX2023579.
文摘Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.
基金supported by Natural Science Foundation of Tianjin Municipality under Grant(No.23JCYBJC01670).
文摘This paper systematically studies the movement behavior changes of Camponotus japonicus under one or two leg injuries.Firstly,a linear motion channel matching the size of the ants'legs was designed,and the movements of ants with different leg injuries were captured using high-speed cameras,constructing a comprehensive video dataset of ants'movements with missing legs.Secondly,stable and reliable motion position information for keypoints on the ants'bodies and legs was obtained by utilizing low-annotation biometric keypoint detection technology.Finally,by analyzing the filtered gait data,information about the changes in the step locational points areas,phase differences,and duty factors of the injured ants'remaining legs was obtained.Comparative analysis of the ants'gait characteristics revealed some common adjustment patterns when the ants were in the injured states.Additionally,the study found that the loss of a foreleg had a significant impact on the ants'movement.When two legs were missing,the loss of both legs on the same side had a greater effect on movement,whereas symmetric opposite-side leg loss conditions had a lesser impact.The research will provide an important reference for the subsequent design of gait adjustment algorithms for biomimetic multi-legged robots under damaged conditions.
基金support from the National Key Research and Development Program of China(Grant No.2022YFD2000500).
文摘In greenhouse environments,using automated machines for tomato harvesting to reduce labor consumption is a future development trend.Accurate and effective visual recognition is essential to accomplish harvesting tasks.However,most current studies use various models to gain harvesting information in multiple steps,resulting in heavy calculation costs,poor real-time availability,and weak recognition precision.In this study,an improved YOLOv8np-RCW end-to-end model based on YOLOv8n pose is proposed to simultaneously detect tomato bunches,maturity,and keypoints using a decoupled-head structure.The model integrates a ResNet-enhanced RepVGG architecture for a balance of accuracy and speed,employs the CARAFE upsampling algorithm for a larger receptive field with lightweight design,and optimizes the loss function with WIoU loss to enhance bounding box prediction,maturity detection,and keypoint extraction.Experimental results indicate that mAP50 of YOLOv8np-RCW model for bounding box and keypoints is 87.3%and 86.8%respectively,which is 6.2%and 5.5%higher than YOLOv8n pose model.Completing the tasks of bunch detection,maturity assessment,and keypoint localization requires only 9.8 ms.Euclidean distance error is less than 20 pixels in detecting keypoints.Based on this model,a method is proposed to quickly determine the orientation of tomato bunches using geometric cross-product and cross-multiplication calculations from keypoint 2D information,providing guidance for the motion planning of the end-effector.In field experiments,the robot achieved a harvesting success rate of 68%,with an average time of 10.8366 seconds per tomato bunch.