This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such...This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.展开更多
Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. There...Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.展开更多
To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, key...To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.展开更多
The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypo...The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypoint detectors.While these repeatability rates are calculated for pairs of images,the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset.So,these rates are asymmetrical as they require calculations in only one direction.In addition,the image domain in which these computations take place substantially affects their values.The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images.Therefore,both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors.This paper,firstly,examines several commonly utilized repeatability rate measures for keypoint detector evaluations.The researcher then suggests computing a two-fold repeatability rate to assess keypoint detector performance on similar scene images.Next,the symmetric mean repeatability rate metric is computed using the given two-fold repeatability rates.Finally,these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.展开更多
Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in...Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.展开更多
With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the cloth...With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.展开更多
Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic ...Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.展开更多
The pod and seed counts are important yield-related traits in soybean.High-precision soybean breeders face the major challenge of accurately phenotyping the number of pods and seeds in a high-throughput manner.Recent ...The pod and seed counts are important yield-related traits in soybean.High-precision soybean breeders face the major challenge of accurately phenotyping the number of pods and seeds in a high-throughput manner.Recent advances in artificial intelligence,especially deep learning(DL)models,have provided new avenues for high-throughput phenotyping of crop traits with increased precision.However,the available DL models are less effective for phenotyping pods that are densely packed and overlap in insitu soybean plants;thus,accurate phenotyping of the number of pods and seeds in soybean plant is an important challenge.To address this challenge,the present study proposed a bottom-up model,DEKR-SPrior(disentangled keypoint regression with structural prior),for insitu soybean pod phenotyping,which considers soybean pods and seeds analogous to human people and joints,respectively.In particular,we designed a novel structural prior(SPrior)module that utilizes cosine similarity to improve feature discrimination,which is important for differentiating closely located seeds from highly similar seeds.To further enhance the accuracy of pod location,we cropped full-sized images into smaller and high-resolution subimages for analysis.The results on our image datasets revealed that DEKR-SPrior outperformed multiple bottom-up models,viz.,Lightweight-Open Pose,OpenPose,HigherH R Net,and DEKR,reducing the mean absolute error from 25.81(in the original DEKR)to 21.11(in the DEKR-SPrior)in pod phenotyping.This paper demonstrated the great potential of DEKR-SPrior for plant phenotyping,and we hope that DEKR-SPrior will help future plant phenotyping.展开更多
Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onbo...Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.展开更多
文摘This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.
文摘Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.
基金Supported by the National Natural Science Foundation of China (61802253)。
文摘To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.
文摘The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypoint detectors.While these repeatability rates are calculated for pairs of images,the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset.So,these rates are asymmetrical as they require calculations in only one direction.In addition,the image domain in which these computations take place substantially affects their values.The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images.Therefore,both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors.This paper,firstly,examines several commonly utilized repeatability rate measures for keypoint detector evaluations.The researcher then suggests computing a two-fold repeatability rate to assess keypoint detector performance on similar scene images.Next,the symmetric mean repeatability rate metric is computed using the given two-fold repeatability rates.Finally,these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.
基金This work was supported by the National Natural Science Foundation of China(61871046,SM,http://www.nsfc.gov.cn/).
文摘Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.
基金National Key Research and Development Program,China(No.2019YFC1521300)。
文摘With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.
基金supported by Hainan Provincial Key Research and Development Program(NO:ZDYF2020018)Hainan Provincial Natural Science Foundation of China(NO:2019RC100)Haikou key research and development program(NO:2020-049).
文摘Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.
基金supported in part by the National Key Research and Development Program of China(2023YFD-1202600)the National Natural Science Foundation of China(62103380)+3 种基金the Research and Development Project from the Department of Science and Technology of Zhejiang Province(2023C01042)Soybean Intelligent Computational Breeding and Application of the Zhejiang Lab(2021PE0AC04)Intelligent Technology and Platform Development for Rice Breeding of the Zhejiang Lab(2021PE0AC05)Fine-grained Semantic Modeling and Cross modal Encoding-Decoding for Multilingual Scene Text Extraction(2022M722911).
文摘The pod and seed counts are important yield-related traits in soybean.High-precision soybean breeders face the major challenge of accurately phenotyping the number of pods and seeds in a high-throughput manner.Recent advances in artificial intelligence,especially deep learning(DL)models,have provided new avenues for high-throughput phenotyping of crop traits with increased precision.However,the available DL models are less effective for phenotyping pods that are densely packed and overlap in insitu soybean plants;thus,accurate phenotyping of the number of pods and seeds in soybean plant is an important challenge.To address this challenge,the present study proposed a bottom-up model,DEKR-SPrior(disentangled keypoint regression with structural prior),for insitu soybean pod phenotyping,which considers soybean pods and seeds analogous to human people and joints,respectively.In particular,we designed a novel structural prior(SPrior)module that utilizes cosine similarity to improve feature discrimination,which is important for differentiating closely located seeds from highly similar seeds.To further enhance the accuracy of pod location,we cropped full-sized images into smaller and high-resolution subimages for analysis.The results on our image datasets revealed that DEKR-SPrior outperformed multiple bottom-up models,viz.,Lightweight-Open Pose,OpenPose,HigherH R Net,and DEKR,reducing the mean absolute error from 25.81(in the original DEKR)to 21.11(in the DEKR-SPrior)in pod phenotyping.This paper demonstrated the great potential of DEKR-SPrior for plant phenotyping,and we hope that DEKR-SPrior will help future plant phenotyping.
基金co-supported by the National Natural Science Foundation of China(Nos.12302252 and 12472189)the Research Program of National University of Defense Technology,China(No.ZK24-31).
文摘Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.