To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, key...To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.展开更多
With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the cloth...With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.展开更多
Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic ...Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.展开更多
Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. There...Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.展开更多
Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onbo...Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.展开更多
BACKGROUND Coronary computed tomography angiography(CCTA)is essential for diagnosing coronary artery disease as it provides detailed images of the heart’s blood vessels to identify blockages or abnormalities.Traditio...BACKGROUND Coronary computed tomography angiography(CCTA)is essential for diagnosing coronary artery disease as it provides detailed images of the heart’s blood vessels to identify blockages or abnormalities.Traditionally,determining the computed tomography(CT)scanning range has relied on manual methods due to limited automation in this area.AIM To develop and evaluate a novel deep learning approach to automate the determ-ination of CCTA scan ranges using anteroposterior scout images.METHODS A retrospective analysis was conducted on chest CT data from 1388 patients at the Radiology Department of the First Affiliated Hospital of a university-affiliated hospital,collected between February 27 and March 27,2024.A deep learning model was trained on anteroposterior scout images with annotations based on CCTA standards.The dataset was split into training(672 cases),validation(167 cases),and test(167 cases)sets to ensure robust model evaluation.RESULTS The study demonstrated exceptional performance on the test set,achieving a mean average precision(mAP50)of 0.995 and mAP50-95 of 0.994 for determining CCTA scan ranges.CONCLUSION This study demonstrates that:(1)Anteroposterior scout images can effectively estimate CCTA scan ranges;and(2)Estimates can be dynamically adjusted to meet the needs of various medical institutions.展开更多
In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has ...In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has increasingly incorporated the use of machine learning and artificial intelligence.This research focuses on detecting and recognizing events in sequential photos characterized by several factors,including the size,location,and position of people’s body parts in those pictures,and the influence around those people.Common approaches utilized,here are feature descriptors such as MSER(Maximally Stable Extremal Regions),SIFT(Scale-Invariant Feature Transform),and DOF(degree of freedom)between the joint points are applied to the skeleton points.Moreover,for the same purposes,other features such as BRISK(Binary Robust Invariant Scalable Keypoints),ORB(Oriented FAST and Rotated BRIEF),and HOG(Histogram of Oriented Gradients)are applied on full body or silhouettes.The integration of these techniques increases the discriminative nature of characteristics retrieved in the identification process of the event,hence improving the efficiency and reliability of the entire procedure.These extracted features are passed to the early fusion and DBscan for feature fusion and optimization.Then deep belief,network is employed for recognition.Experimental results demonstrate a separate experiment’s detection average recognition rate of 87%in the HMDB51 video database and 89%in the YouTube database,showing a better perspective than the current methods in sports and event identification.展开更多
The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of ...The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of object and etalon descriptions while maintaining the required level of classification efficiency.The class to be recognized is represented by an infinite set of images obtained from the etalon by applying arbitrary geometric transformations.It is proposed to reduce the descriptions for the etalon database by selecting the most significant descriptor components according to the information content criterion.The informativeness of an etalon descriptor is estimated by the difference of the closest distances to its own and other descriptions.The developed method determines the relevance of the full description of the recognized object with the reduced description of the etalons.Several practical models of the classifier with different options for establishing the correspondence between object descriptors and etalons are considered.The results of the experimental modeling of the proposed methods for a database including images of museum jewelry are presented.The test sample is formed as a set of images from the etalon database and out of the database with the application of geometric transformations of scale and rotation in the field of view.The practical problems of determining the threshold for the number of votes,based on which a classification decision is made,have been researched.Modeling has revealed the practical possibility of tenfold reducing descriptions with full preservation of classification accuracy.Reducing the descriptions by twenty times in the experiment leads to slightly decreased accuracy.The speed of the analysis increases in proportion to the degree of reduction.The use of reduction by the informativeness criterion confirmed the possibility of obtaining the most significant subset of features for classification,which guarantees a decent level of accuracy.展开更多
Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,huma...Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.展开更多
Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we...Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.展开更多
This paper systematically studies the movement behavior changes of Camponotus japonicus under one or two leg injuries.Firstly,a linear motion channel matching the size of the ants'legs was designed,and the movemen...This paper systematically studies the movement behavior changes of Camponotus japonicus under one or two leg injuries.Firstly,a linear motion channel matching the size of the ants'legs was designed,and the movements of ants with different leg injuries were captured using high-speed cameras,constructing a comprehensive video dataset of ants'movements with missing legs.Secondly,stable and reliable motion position information for keypoints on the ants'bodies and legs was obtained by utilizing low-annotation biometric keypoint detection technology.Finally,by analyzing the filtered gait data,information about the changes in the step locational points areas,phase differences,and duty factors of the injured ants'remaining legs was obtained.Comparative analysis of the ants'gait characteristics revealed some common adjustment patterns when the ants were in the injured states.Additionally,the study found that the loss of a foreleg had a significant impact on the ants'movement.When two legs were missing,the loss of both legs on the same side had a greater effect on movement,whereas symmetric opposite-side leg loss conditions had a lesser impact.The research will provide an important reference for the subsequent design of gait adjustment algorithms for biomimetic multi-legged robots under damaged conditions.展开更多
目的无人机摄像资料的分辨率直接影响目标识别与信息获取,所以摄像分辨率的提高具有重大意义。为了改善无人机侦察视频质量,针对目前无人机摄像、照相数据的特点,提出一种无人机侦察视频超分辨率重建方法。方法首先提出基于AGAST-Differ...目的无人机摄像资料的分辨率直接影响目标识别与信息获取,所以摄像分辨率的提高具有重大意义。为了改善无人机侦察视频质量,针对目前无人机摄像、照相数据的特点,提出一种无人机侦察视频超分辨率重建方法。方法首先提出基于AGAST-Difference与Fast Retina Keypoint(FREAK)的特征匹配算法对视频目标帧与相邻帧之间配准,然后提出匹配区域搜索方法找到目标帧与航片的对应关系,利用航片对视频帧进行高频补偿,最后采用凸集投影方法对补偿后视频帧进行迭代优化。结果基于AGAST-Difference与FREAK的特征匹配算法在尺度、旋转、视点等变化及运行速度上存在很大优势,匹配区域搜索方法使无人机视频的高频补偿连续性更好,凸集投影迭代优化提高了重建的边缘保持能力,与一种简单有效的视频序列超分辨率复原算法相比,本文算法重建质量提高约4 d B,运行速度提高约5倍。结论提出了一种针对无人机的视频超分辨率重建方法,分析了无人机视频超分辨率问题的核心所在,并且提出基于AGAST-Difference与FREAK的特征匹配算法与匹配区域搜索方法来解决图像配准与高频补偿问题。实验结果表明,本文算法强化了重建图像的一致性与保真度,特别是对图像边缘细节部分等效果极为明显,且处理速度更快。展开更多
This paper proposes a novel object detection method in which a set of local features inside the superpixels are extracted from the image under analysis acquired by a 3D visual sensor. To increase the segmentation accu...This paper proposes a novel object detection method in which a set of local features inside the superpixels are extracted from the image under analysis acquired by a 3D visual sensor. To increase the segmentation accuracy, the proposed method firstly performs the segmentation of the image, under analysis, using the Simple Linear Iterative Clustering (SLIC) superpixels method. Next the key points inside each superpixel are estimated using the Speed-Up Robust Feature (SURF). These key points are then used to carry out the matching task for every detected keypoints of a scene inside the estimated superpixels. In addition, a probability map is introduced to describe the accuracy of the object detection results. Experimental results show that the proposed approach provides fairly good object detection and confirms the superior performance of proposed scene compared with other recently proposed methods such as the scheme proposed by Mae et al.展开更多
This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such...This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.展开更多
The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distr...The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distribution for the set of descriptors of image description.The article also proposes the use of the spatial data processing apparatus,which simplifies and accelerates the classification process.Experiments have shown that the time of calculation of the relevance for two descriptions according to their distributions is about 1000 times less than for the traditional voting procedure,for which the sets of descriptors are compared.The introduction of the system of hierarchical features allows to further reduce the calculation time by 2–3 times while ensuring high efficiency of classification.The noise immunity of the method to additive noise has been experimentally studied.According to the results of the research,the marginal degree of the hierarchy of features for reliable classification with the standard deviation of noise less than 30 is the 8-bit distribution.Computing costs increase proportionally with decreasing bit distribution.The method can be used for application tasks where object identification time is critical.展开更多
基金Supported by the National Natural Science Foundation of China (61802253)。
文摘To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.
基金National Key Research and Development Program,China(No.2019YFC1521300)。
文摘With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.
基金supported by Hainan Provincial Key Research and Development Program(NO:ZDYF2020018)Hainan Provincial Natural Science Foundation of China(NO:2019RC100)Haikou key research and development program(NO:2020-049).
文摘Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.
文摘Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.
基金co-supported by the National Natural Science Foundation of China(Nos.12302252 and 12472189)the Research Program of National University of Defense Technology,China(No.ZK24-31).
文摘Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.
基金Supported by Anhui Provincial College Students’Innovation and Entrepreneurship Training Program,No.S202310367063.
文摘BACKGROUND Coronary computed tomography angiography(CCTA)is essential for diagnosing coronary artery disease as it provides detailed images of the heart’s blood vessels to identify blockages or abnormalities.Traditionally,determining the computed tomography(CT)scanning range has relied on manual methods due to limited automation in this area.AIM To develop and evaluate a novel deep learning approach to automate the determ-ination of CCTA scan ranges using anteroposterior scout images.METHODS A retrospective analysis was conducted on chest CT data from 1388 patients at the Radiology Department of the First Affiliated Hospital of a university-affiliated hospital,collected between February 27 and March 27,2024.A deep learning model was trained on anteroposterior scout images with annotations based on CCTA standards.The dataset was split into training(672 cases),validation(167 cases),and test(167 cases)sets to ensure robust model evaluation.RESULTS The study demonstrated exceptional performance on the test set,achieving a mean average precision(mAP50)of 0.995 and mAP50-95 of 0.994 for determining CCTA scan ranges.CONCLUSION This study demonstrates that:(1)Anteroposterior scout images can effectively estimate CCTA scan ranges;and(2)Estimates can be dynamically adjusted to meet the needs of various medical institutions.
基金the MSIT(Ministry of Science and ICT),Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)Program(IITP-2024-RS-2022-00156326)the IITP(Institute of Information&Communications Technology Planning&Evaluation).Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R440)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.This research was supported by the Deanship of Scientific Research at Najran University,under the Research Group Funding program grant code(NU/RG/SERC/13/30).
文摘In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has increasingly incorporated the use of machine learning and artificial intelligence.This research focuses on detecting and recognizing events in sequential photos characterized by several factors,including the size,location,and position of people’s body parts in those pictures,and the influence around those people.Common approaches utilized,here are feature descriptors such as MSER(Maximally Stable Extremal Regions),SIFT(Scale-Invariant Feature Transform),and DOF(degree of freedom)between the joint points are applied to the skeleton points.Moreover,for the same purposes,other features such as BRISK(Binary Robust Invariant Scalable Keypoints),ORB(Oriented FAST and Rotated BRIEF),and HOG(Histogram of Oriented Gradients)are applied on full body or silhouettes.The integration of these techniques increases the discriminative nature of characteristics retrieved in the identification process of the event,hence improving the efficiency and reliability of the entire procedure.These extracted features are passed to the early fusion and DBscan for feature fusion and optimization.Then deep belief,network is employed for recognition.Experimental results demonstrate a separate experiment’s detection average recognition rate of 87%in the HMDB51 video database and 89%in the YouTube database,showing a better perspective than the current methods in sports and event identification.
基金This research was funded by Prince Sattam bin Abdulaziz University(Project Number PSAU/2023/01/25387).
文摘The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of object and etalon descriptions while maintaining the required level of classification efficiency.The class to be recognized is represented by an infinite set of images obtained from the etalon by applying arbitrary geometric transformations.It is proposed to reduce the descriptions for the etalon database by selecting the most significant descriptor components according to the information content criterion.The informativeness of an etalon descriptor is estimated by the difference of the closest distances to its own and other descriptions.The developed method determines the relevance of the full description of the recognized object with the reduced description of the etalons.Several practical models of the classifier with different options for establishing the correspondence between object descriptors and etalons are considered.The results of the experimental modeling of the proposed methods for a database including images of museum jewelry are presented.The test sample is formed as a set of images from the etalon database and out of the database with the application of geometric transformations of scale and rotation in the field of view.The practical problems of determining the threshold for the number of votes,based on which a classification decision is made,have been researched.Modeling has revealed the practical possibility of tenfold reducing descriptions with full preservation of classification accuracy.Reducing the descriptions by twenty times in the experiment leads to slightly decreased accuracy.The speed of the analysis increases in proportion to the degree of reduction.The use of reduction by the informativeness criterion confirmed the possibility of obtaining the most significant subset of features for classification,which guarantees a decent level of accuracy.
基金the National Natural Science Foundation of China(Grant Number 62076246).
文摘Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.
基金supported by the Natural Science Foundation of Hubei Province of China under grant number 2022CFB536the National Natural Science Foundation of China under grant number 62367006the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology under grant number CX2023579.
文摘Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.
基金supported by Natural Science Foundation of Tianjin Municipality under Grant(No.23JCYBJC01670).
文摘This paper systematically studies the movement behavior changes of Camponotus japonicus under one or two leg injuries.Firstly,a linear motion channel matching the size of the ants'legs was designed,and the movements of ants with different leg injuries were captured using high-speed cameras,constructing a comprehensive video dataset of ants'movements with missing legs.Secondly,stable and reliable motion position information for keypoints on the ants'bodies and legs was obtained by utilizing low-annotation biometric keypoint detection technology.Finally,by analyzing the filtered gait data,information about the changes in the step locational points areas,phase differences,and duty factors of the injured ants'remaining legs was obtained.Comparative analysis of the ants'gait characteristics revealed some common adjustment patterns when the ants were in the injured states.Additionally,the study found that the loss of a foreleg had a significant impact on the ants'movement.When two legs were missing,the loss of both legs on the same side had a greater effect on movement,whereas symmetric opposite-side leg loss conditions had a lesser impact.The research will provide an important reference for the subsequent design of gait adjustment algorithms for biomimetic multi-legged robots under damaged conditions.
文摘目的无人机摄像资料的分辨率直接影响目标识别与信息获取,所以摄像分辨率的提高具有重大意义。为了改善无人机侦察视频质量,针对目前无人机摄像、照相数据的特点,提出一种无人机侦察视频超分辨率重建方法。方法首先提出基于AGAST-Difference与Fast Retina Keypoint(FREAK)的特征匹配算法对视频目标帧与相邻帧之间配准,然后提出匹配区域搜索方法找到目标帧与航片的对应关系,利用航片对视频帧进行高频补偿,最后采用凸集投影方法对补偿后视频帧进行迭代优化。结果基于AGAST-Difference与FREAK的特征匹配算法在尺度、旋转、视点等变化及运行速度上存在很大优势,匹配区域搜索方法使无人机视频的高频补偿连续性更好,凸集投影迭代优化提高了重建的边缘保持能力,与一种简单有效的视频序列超分辨率复原算法相比,本文算法重建质量提高约4 d B,运行速度提高约5倍。结论提出了一种针对无人机的视频超分辨率重建方法,分析了无人机视频超分辨率问题的核心所在,并且提出基于AGAST-Difference与FREAK的特征匹配算法与匹配区域搜索方法来解决图像配准与高频补偿问题。实验结果表明,本文算法强化了重建图像的一致性与保真度,特别是对图像边缘细节部分等效果极为明显,且处理速度更快。
文摘This paper proposes a novel object detection method in which a set of local features inside the superpixels are extracted from the image under analysis acquired by a 3D visual sensor. To increase the segmentation accuracy, the proposed method firstly performs the segmentation of the image, under analysis, using the Simple Linear Iterative Clustering (SLIC) superpixels method. Next the key points inside each superpixel are estimated using the Speed-Up Robust Feature (SURF). These key points are then used to carry out the matching task for every detected keypoints of a scene inside the estimated superpixels. In addition, a probability map is introduced to describe the accuracy of the object detection results. Experimental results show that the proposed approach provides fairly good object detection and confirms the superior performance of proposed scene compared with other recently proposed methods such as the scheme proposed by Mae et al.
文摘This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.
文摘The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distribution for the set of descriptors of image description.The article also proposes the use of the spatial data processing apparatus,which simplifies and accelerates the classification process.Experiments have shown that the time of calculation of the relevance for two descriptions according to their distributions is about 1000 times less than for the traditional voting procedure,for which the sets of descriptors are compared.The introduction of the system of hierarchical features allows to further reduce the calculation time by 2–3 times while ensuring high efficiency of classification.The noise immunity of the method to additive noise has been experimentally studied.According to the results of the research,the marginal degree of the hierarchy of features for reliable classification with the standard deviation of noise less than 30 is the 8-bit distribution.Computing costs increase proportionally with decreasing bit distribution.The method can be used for application tasks where object identification time is critical.