This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such...This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.展开更多
Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. There...Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.展开更多
To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, key...To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.展开更多
The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypo...The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypoint detectors.While these repeatability rates are calculated for pairs of images,the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset.So,these rates are asymmetrical as they require calculations in only one direction.In addition,the image domain in which these computations take place substantially affects their values.The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images.Therefore,both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors.This paper,firstly,examines several commonly utilized repeatability rate measures for keypoint detector evaluations.The researcher then suggests computing a two-fold repeatability rate to assess keypoint detector performance on similar scene images.Next,the symmetric mean repeatability rate metric is computed using the given two-fold repeatability rates.Finally,these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.展开更多
Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in...Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.展开更多
With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the cloth...With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.展开更多
Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic ...Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.展开更多
Weed growth significantly impacts corn yield.With the continuous development of weed control technologies,achieving more effective and precise weed management has become a major challenge in corn production.To achieve...Weed growth significantly impacts corn yield.With the continuous development of weed control technologies,achieving more effective and precise weed management has become a major challenge in corn production.To achieve precise weed suppression,this study proposes a growth point detection method based on a keypoint pose estimation model capable of effectively detecting various weeds and locating various weed growth points during the 2nd-5th leaf stage of corn development.To address the complex working environment of precision weeding machines in corn fields,including occlusion,dense growth,and variable lighting conditions,we design a dilation-wise residual module(DWRM)for the detector and a separation and enhancement attention module(SEAM)for pose estimation to adapt to these challenges.Furthermore,owing to the limited computational re-sources in field settings,we introduced the RepViT block(RVB)to achieve model lightweighting.The proposed method was evaluated on the constructed corn field dataset.The experimental results demonstrated that SRD-YOLO achieved an mAPkpt of 96.5%,an Fl score of 94%,and an FPS of 169,while reducing the model pa-rameters by 8.7M.SRD-YOLO effectively meets the requirements for growth point localization under challenging conditions,providing robust technical support for real-time and precise weed control in corn fields.展开更多
针对人体姿态估计中遮挡带来的缺乏图像低级特征指导和预测姿势与人体生理结构的不一致性问题,提出了一种新颖的生成式人体姿态估计方法(generative human pose estimation,GenPose)。该模型使用多尺度信息融合和条件生成模块解决了严...针对人体姿态估计中遮挡带来的缺乏图像低级特征指导和预测姿势与人体生理结构的不一致性问题,提出了一种新颖的生成式人体姿态估计方法(generative human pose estimation,GenPose)。该模型使用多尺度信息融合和条件生成模块解决了严重遮挡问题。多尺度模块从尺度和通道上细粒度融合图像特征,能捕捉到更多肢体细节,从而推理出遮挡关键点的特征信息。条件生成模块通过建模遮挡场景与姿态间的对应关系,根据标记编码器特征动态调整生成姿态,在保证可见点准确率的同时,在一定程度上减少了遮挡对非遮挡的干扰,提升了对遮挡姿态的生成效果。在公开的COCO和MPII数据集上,同以往方法相比,有了更好的结果,同时在CrowdPose、OCHuman以及SyncOCC数据集上验证了泛化能力。该模型在一定程度上能够解决严重遮挡下的姿态估计问题,提高了预测姿态的合理性,取得了更加优异的效果。展开更多
文摘This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.
文摘Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.
基金Supported by the National Natural Science Foundation of China (61802253)。
文摘To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.
文摘The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypoint detectors.While these repeatability rates are calculated for pairs of images,the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset.So,these rates are asymmetrical as they require calculations in only one direction.In addition,the image domain in which these computations take place substantially affects their values.The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images.Therefore,both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors.This paper,firstly,examines several commonly utilized repeatability rate measures for keypoint detector evaluations.The researcher then suggests computing a two-fold repeatability rate to assess keypoint detector performance on similar scene images.Next,the symmetric mean repeatability rate metric is computed using the given two-fold repeatability rates.Finally,these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.
基金This work was supported by the National Natural Science Foundation of China(61871046,SM,http://www.nsfc.gov.cn/).
文摘Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.
基金National Key Research and Development Program,China(No.2019YFC1521300)。
文摘With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.
基金supported by Hainan Provincial Key Research and Development Program(NO:ZDYF2020018)Hainan Provincial Natural Science Foundation of China(NO:2019RC100)Haikou key research and development program(NO:2020-049).
文摘Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.
基金This work was supported in part by the Tibet Shigatse Science and Technology Projects(No.RKZ2024ZY-03)the Shandong Province Modern Agricultural Industry Technology System,China(No.SDAIT-18-06)+1 种基金the China Agriculture Research System of MOF and MARA(No.CARS-18-ZJ0402)the National Natural Science Foundation of China(No.32001419).
文摘Weed growth significantly impacts corn yield.With the continuous development of weed control technologies,achieving more effective and precise weed management has become a major challenge in corn production.To achieve precise weed suppression,this study proposes a growth point detection method based on a keypoint pose estimation model capable of effectively detecting various weeds and locating various weed growth points during the 2nd-5th leaf stage of corn development.To address the complex working environment of precision weeding machines in corn fields,including occlusion,dense growth,and variable lighting conditions,we design a dilation-wise residual module(DWRM)for the detector and a separation and enhancement attention module(SEAM)for pose estimation to adapt to these challenges.Furthermore,owing to the limited computational re-sources in field settings,we introduced the RepViT block(RVB)to achieve model lightweighting.The proposed method was evaluated on the constructed corn field dataset.The experimental results demonstrated that SRD-YOLO achieved an mAPkpt of 96.5%,an Fl score of 94%,and an FPS of 169,while reducing the model pa-rameters by 8.7M.SRD-YOLO effectively meets the requirements for growth point localization under challenging conditions,providing robust technical support for real-time and precise weed control in corn fields.
文摘针对人体姿态估计中遮挡带来的缺乏图像低级特征指导和预测姿势与人体生理结构的不一致性问题,提出了一种新颖的生成式人体姿态估计方法(generative human pose estimation,GenPose)。该模型使用多尺度信息融合和条件生成模块解决了严重遮挡问题。多尺度模块从尺度和通道上细粒度融合图像特征,能捕捉到更多肢体细节,从而推理出遮挡关键点的特征信息。条件生成模块通过建模遮挡场景与姿态间的对应关系,根据标记编码器特征动态调整生成姿态,在保证可见点准确率的同时,在一定程度上减少了遮挡对非遮挡的干扰,提升了对遮挡姿态的生成效果。在公开的COCO和MPII数据集上,同以往方法相比,有了更好的结果,同时在CrowdPose、OCHuman以及SyncOCC数据集上验证了泛化能力。该模型在一定程度上能够解决严重遮挡下的姿态估计问题,提高了预测姿态的合理性,取得了更加优异的效果。