Coarse-to-fine pyramid and scale space are two important image structures in the realm of image matching.However,the advantage of coarse-to-fine pyramid is neglected as the pyramid structure is usually constructed wit...Coarse-to-fine pyramid and scale space are two important image structures in the realm of image matching.However,the advantage of coarse-to-fine pyramid is neglected as the pyramid structure is usually constructed with the down sampling method in scale space.In addition,the importance of each lattice is different for one single image.Based on the analyses above,the new multi-pyramid(M-P)image spatial structure is constructed.First,coarse-to-fine pyramid is constructed by partitioning the original image into increasingly finer lattices,and the number of interest points is also adopted to be each lattice’s non-normalized weight on each pyramid level.Second,the scale space of each lattice on each pyramid level is generated with the classic Gaussian kernel.Third,the descriptors of each lattice are generated by regarding the stability of scale space as the description of image.Moreover,the parallel version of M-P algorithm is also presented to accelerate the speed of computation.Finally,the comprehensive experimental results reveal that our multi-pyramid structure which is constructed by the combination of coarse-to-fine spatial pyramid and scale space can generate more effective features,compared with the other related methods.展开更多
As the pancreas only occupies a small region in the whole abdominal computed tomography(CT)scans and has high variability in shape,location and size,deep neural networks in automatic pancreas segmentation task can be ...As the pancreas only occupies a small region in the whole abdominal computed tomography(CT)scans and has high variability in shape,location and size,deep neural networks in automatic pancreas segmentation task can be easily confused by the complex and variable background.To alleviate these issues,this paper proposes a novel pancreas segmentation optimization based on the coarse-to-fine structure,in which the coarse stage is responsible for increasing the proportion of the target region in the input image through the minimum bounding box,and the fine is for improving the accuracy of pancreas segmentation by enhancing the data diversity and by introducing a new segmentation model,and reducing the running time by adding a total weights constraint.This optimization is evaluated on the public pancreas segmentation dataset and achieves 87.87%average Dice-Sørensen coefficient(DSC)accuracy,which is 0.94%higher than 86.93%,result of the state-of-the-art pancreas segmentation methods.Moreover,this method has strong generalization that it can be easily applied to other coarse-to-fine or one step organ segmentation tasks.展开更多
Semantic change detection(SCD)and land cover mapping(LCM)are always treated as a dual task in thefield of remote sensing.However,due to diverse real-world scenarios,many SCD categories are not easy to be clearly recog...Semantic change detection(SCD)and land cover mapping(LCM)are always treated as a dual task in thefield of remote sensing.However,due to diverse real-world scenarios,many SCD categories are not easy to be clearly recognized,such as“water-vegetation”and“water-tree”,which can be regarded asfine-grained differences.In addition,even a single LCM category is usually difficult to define.For instance,some“vegetation”categories with litter vegetation coverage are easily confused with the general“ground”category.SCD/LCM becomes challenging under both challenges of itsfine-grained nature and label ambiguity.In this paper,we tackle the SCD and LCM tasks simultaneously by proposing a coarse-to-fine attention tree(CAT)model.Specifically,it consists of an encoder,a decoder and a coarse-to-fine attention tree module.The encoder-decoder structure extracts the high-level features from input multi-temporal imagesfirst and then reconstructs them to return SCD and LCM predictions.Our coarse-to-fine attention tree,on the one hand,utilizes the tree structure to better model a hierarchy of categories by predicting the coarse-grained labelsfirst and then predicting thefine-grained labels later.On the other hand,it applies the attention mechanism to capture discriminative pixel regions.Furthermore,to address label ambiguity in SCD/LCM,we also equip a label distribution learning loss upon our model.Experiments on the large-scale SECOND dataset justify that the proposed CAT model outperforms state-of-the-art models.Moreover,various ablation studies have demonstrated the effectiveness of tailored designs in the CAT model for solving semantic change detection problems.展开更多
An approach to addressing the stereo correspondence problem is presented using genetic algorithms (GAs) to obtain a dense disparity map. Different from previous methods, this approach casts the stereo matching as a mu...An approach to addressing the stereo correspondence problem is presented using genetic algorithms (GAs) to obtain a dense disparity map. Different from previous methods, this approach casts the stereo matching as a multi-extrema optimization problem such that finding the fittest solution from a set of potential disparity maps. Among a wide variety of optimization techniques, GAs are proven to be potentially effective methods for the global optimization problems with large search space. With this idea, each disparity map is viewed as an individual and the disparity values are encoded as chromosomes, so each individual has lots of chromosomes in the approach. Then, several matching constraints are formulated into an objective function, and GAs are used to search the global optimal solution for the problem. Furthermore, the coarse-to-fine strategy has been embedded in the approach so as to reduce the matching ambiguity and the time consumption. Finally, experimental results on synthetic and real images show the performance of the work.展开更多
A minimal generalized time-bandwidth product-based coarse-to-fine strategy is proposed with one novel ideas highlighted: adopting a coarse-to-fine strategy to speed up the searching process. The simulation results on ...A minimal generalized time-bandwidth product-based coarse-to-fine strategy is proposed with one novel ideas highlighted: adopting a coarse-to-fine strategy to speed up the searching process. The simulation results on synthetic and real signals show the validity of the proposed method.展开更多
Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplic...Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplicate detection methods. We have designed a coarse-to-fine near duplicate detection framework to speed-up the process and a multi-modal integra-tion scheme for accurate detection. The duplicate pairs are detected with both global feature (partition based color his-togram) and local feature (CPAM and SIFT Bag-of-Word model). The experiment results on large scale data set proved the effectiveness of the proposed design.展开更多
Knowledge selection is a challenging task that often deals with semantic drift issues when knowledge is retrieved based on semantic similarity between a fact and a question. In addition, weak correlations embedded in ...Knowledge selection is a challenging task that often deals with semantic drift issues when knowledge is retrieved based on semantic similarity between a fact and a question. In addition, weak correlations embedded in pairs of facts and questions and gigantic knowledge bases available for knowledge search are also unavoidable issues. This paper presents a scalable approach to address these issues. A sparse encoder and a dense encoder are coupled iteratively to retrieve fact candidates from a large-scale knowledge base. A pre-trained language model with two rounds of fine-tuning using results of the sparse and dense encoders is then used to re-rank fact candidates. Top-k facts are selected by a specific re-ranker. The scalable approach is applied on two textual inference datasets and one knowledge-grounded question answering dataset. Experimental results demonstrate that (1) the proposed approach can improve the performance of knowledge selection by reducing the semantic drift;(2) the proposed approach produces outstanding results on the benchmark datasets. The code is available at https://github.com/hhhhzs666/KSIHER.展开更多
In the domain of point cloud registration,the coarse-to-fine feature matching paradigm has received significant attention due to its impressive performance.This paradigm involves a two-step process:first,the extractio...In the domain of point cloud registration,the coarse-to-fine feature matching paradigm has received significant attention due to its impressive performance.This paradigm involves a two-step process:first,the extraction of multilevel features,and subsequently,the propagation of correspondences from coarse to fine levels.However,this approach faces two notable limitations.Firstly,the use of the Dual Softmax operation may promote one-to-one correspondences between superpoints,inadvertently excluding valuable correspondences.Secondly,it is crucial to closely examine the overlapping areas between point clouds,as only correspondences within these regions decisively determine the actual transformation.Considering these issues,we propose OAAFormer to enhance correspondence quality.On the one hand,we introduce a soft matching mechanism to facilitate the propagation of potentially valuable correspondences from coarse to fine levels.On the other hand,we integrate an overlapping region detection module to minimize mismatches to the greatest extent possible.Furthermore,we introduce a region-wise attention module with linear complexity during the fine-level matching phase,designed to enhance the discriminative capabilities of the extracted features.Tests on the challenging 3DLoMatch benchmark demonstrate that our approach leads to a substantial increase of about 7%in the inlier ratio,as well as an enhancement of 2%-4%in registration recall.Finally,to accelerate the prediction process,we replace the Conventional Random Sample Consensus(RANSAC)algorithm with the selection of a limited yet representative set of high-confidence correspondences,resulting in a 100 times speedup while still maintaining comparable registration performance.展开更多
Humongous amounts of data bring various challenges to face image retrieval. This paper proposes an efficient method to solve those problems. Firstly,we use accurate facial landmark locations as shape features. Secondl...Humongous amounts of data bring various challenges to face image retrieval. This paper proposes an efficient method to solve those problems. Firstly,we use accurate facial landmark locations as shape features. Secondly, we utilise shape priors to provide discriminative texture features for convolutional neural networks. These shape and texture features are fused to make the learned representation more robust.Finally, in order to increase efficiency, a coarse-tofine search mechanism is exploited to efficiently find similar objects. Extensive experiments on the CASIAWeb Face, MSRA-CFW, and LFW datasets illustrate the superiority of our method.展开更多
The images from a monocular camera can be processed to detect depth information regarding obstacles in the blind spot area captured by the side-view camera of a vehicle.The depth information is given as a classificati...The images from a monocular camera can be processed to detect depth information regarding obstacles in the blind spot area captured by the side-view camera of a vehicle.The depth information is given as a classification result“near”or“far”when two blocks in the image are compared with respect to their distances and the depth information can be used for the purpose of blind spot area detection.In this paper,the proposed depth information is inferred from a combination of blur cues and texture cues.The depth information is estimated by comparing the features of two image blocks selected within a single image.A preliminary experiment demonstrates that a convolutional neural network(CNN)model trained by deep learning with a set of relatively ideal images achieves good accuracy.The same CNN model is applied to distinguish near and far obstacles according to a specified threshold in the vehicle blind spot area,and the promising results are obtained.The proposed method uses a standard blind spot camera and can improve safety without other additional sensing devices.Thus,the proposed approach has the potential to be applied in vehicular applications for the detection of objects in the driver’s blind spot.展开更多
A shot presents a contiguous action recorded by an uninterrupted camera operation and frames within a shot keep spatio-temporal coherence. Segmenting a serial video stream file into meaningful shots is the first pass ...A shot presents a contiguous action recorded by an uninterrupted camera operation and frames within a shot keep spatio-temporal coherence. Segmenting a serial video stream file into meaningful shots is the first pass for the task of video analysis, content-based video understanding. In this paper, a novel scheme based on improved two-dimensional entropy is proposed to complete the partition of video shots. Firstly, shot transition candidates are detected using a two-pass algorithm: a coarse searching pass and a fine searching pass. Secondly, with the character of two-dimensional entropy of the image, correctly detected transition candidates are further classified into different transition types whereas those falsely detected shot breaks are distinguished and removed. Finally, the boundary of gradual transition can be precisely located by merging the characters of two-dimensional entropy of the image into the gradual transition. A large number of video sequences are used to test our system performance and promising results are obtained.展开更多
文摘Coarse-to-fine pyramid and scale space are two important image structures in the realm of image matching.However,the advantage of coarse-to-fine pyramid is neglected as the pyramid structure is usually constructed with the down sampling method in scale space.In addition,the importance of each lattice is different for one single image.Based on the analyses above,the new multi-pyramid(M-P)image spatial structure is constructed.First,coarse-to-fine pyramid is constructed by partitioning the original image into increasingly finer lattices,and the number of interest points is also adopted to be each lattice’s non-normalized weight on each pyramid level.Second,the scale space of each lattice on each pyramid level is generated with the classic Gaussian kernel.Third,the descriptors of each lattice are generated by regarding the stability of scale space as the description of image.Moreover,the parallel version of M-P algorithm is also presented to accelerate the speed of computation.Finally,the comprehensive experimental results reveal that our multi-pyramid structure which is constructed by the combination of coarse-to-fine spatial pyramid and scale space can generate more effective features,compared with the other related methods.
基金supported by the National Natural Science Foundation of China[61772242,61976106,61572239]the China Postdoctoral Science Foundation[2017M611737]+3 种基金the Six Talent Peaks Project in Jiangsu Province[DZXX-122]the Jiangsu Province EmergencyManagement Science and Technology Project[YJGL-TG-2020-8]the Key Research and Development Plan of Zhenjiang City[SH2020011]Postgraduate Innovation Fund of Jiangsu Province[KYCX18_2257].
文摘As the pancreas only occupies a small region in the whole abdominal computed tomography(CT)scans and has high variability in shape,location and size,deep neural networks in automatic pancreas segmentation task can be easily confused by the complex and variable background.To alleviate these issues,this paper proposes a novel pancreas segmentation optimization based on the coarse-to-fine structure,in which the coarse stage is responsible for increasing the proportion of the target region in the input image through the minimum bounding box,and the fine is for improving the accuracy of pancreas segmentation by enhancing the data diversity and by introducing a new segmentation model,and reducing the running time by adding a total weights constraint.This optimization is evaluated on the public pancreas segmentation dataset and achieves 87.87%average Dice-Sørensen coefficient(DSC)accuracy,which is 0.94%higher than 86.93%,result of the state-of-the-art pancreas segmentation methods.Moreover,this method has strong generalization that it can be easily applied to other coarse-to-fine or one step organ segmentation tasks.
基金supported by National Key R&D Program of China(No.2021YFA1001100)National Natural Science Foundation of China(Nos.62272231,61925201,62132001,and U21B2025)+2 种基金Natural Science Foundation of Jiangsu Province of China under Grant(No.BK20210340)the Fundamental Research Funds for the Central Universities(No.30920041111)CAAI-Huawei MindSpore Open Fund,and Beijing Academy of Artificial Intelligence(BAAI).
文摘Semantic change detection(SCD)and land cover mapping(LCM)are always treated as a dual task in thefield of remote sensing.However,due to diverse real-world scenarios,many SCD categories are not easy to be clearly recognized,such as“water-vegetation”and“water-tree”,which can be regarded asfine-grained differences.In addition,even a single LCM category is usually difficult to define.For instance,some“vegetation”categories with litter vegetation coverage are easily confused with the general“ground”category.SCD/LCM becomes challenging under both challenges of itsfine-grained nature and label ambiguity.In this paper,we tackle the SCD and LCM tasks simultaneously by proposing a coarse-to-fine attention tree(CAT)model.Specifically,it consists of an encoder,a decoder and a coarse-to-fine attention tree module.The encoder-decoder structure extracts the high-level features from input multi-temporal imagesfirst and then reconstructs them to return SCD and LCM predictions.Our coarse-to-fine attention tree,on the one hand,utilizes the tree structure to better model a hierarchy of categories by predicting the coarse-grained labelsfirst and then predicting thefine-grained labels later.On the other hand,it applies the attention mechanism to capture discriminative pixel regions.Furthermore,to address label ambiguity in SCD/LCM,we also equip a label distribution learning loss upon our model.Experiments on the large-scale SECOND dataset justify that the proposed CAT model outperforms state-of-the-art models.Moreover,various ablation studies have demonstrated the effectiveness of tailored designs in the CAT model for solving semantic change detection problems.
文摘An approach to addressing the stereo correspondence problem is presented using genetic algorithms (GAs) to obtain a dense disparity map. Different from previous methods, this approach casts the stereo matching as a multi-extrema optimization problem such that finding the fittest solution from a set of potential disparity maps. Among a wide variety of optimization techniques, GAs are proven to be potentially effective methods for the global optimization problems with large search space. With this idea, each disparity map is viewed as an individual and the disparity values are encoded as chromosomes, so each individual has lots of chromosomes in the approach. Then, several matching constraints are formulated into an objective function, and GAs are used to search the global optimal solution for the problem. Furthermore, the coarse-to-fine strategy has been embedded in the approach so as to reduce the matching ambiguity and the time consumption. Finally, experimental results on synthetic and real images show the performance of the work.
文摘A minimal generalized time-bandwidth product-based coarse-to-fine strategy is proposed with one novel ideas highlighted: adopting a coarse-to-fine strategy to speed up the searching process. The simulation results on synthetic and real signals show the validity of the proposed method.
文摘Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplicate detection methods. We have designed a coarse-to-fine near duplicate detection framework to speed-up the process and a multi-modal integra-tion scheme for accurate detection. The duplicate pairs are detected with both global feature (partition based color his-togram) and local feature (CPAM and SIFT Bag-of-Word model). The experiment results on large scale data set proved the effectiveness of the proposed design.
基金supported by the Wuyi University-Hong Kong-Macao Joint Funding Scheme(No.2022WGALH17)the Research Platform and Project of Universities of Education Department of Guangdong Province,China 2023(No.2023ZDZX1030).
文摘Knowledge selection is a challenging task that often deals with semantic drift issues when knowledge is retrieved based on semantic similarity between a fact and a question. In addition, weak correlations embedded in pairs of facts and questions and gigantic knowledge bases available for knowledge search are also unavoidable issues. This paper presents a scalable approach to address these issues. A sparse encoder and a dense encoder are coupled iteratively to retrieve fact candidates from a large-scale knowledge base. A pre-trained language model with two rounds of fine-tuning using results of the sparse and dense encoders is then used to re-rank fact candidates. Top-k facts are selected by a specific re-ranker. The scalable approach is applied on two textual inference datasets and one knowledge-grounded question answering dataset. Experimental results demonstrate that (1) the proposed approach can improve the performance of knowledge selection by reducing the semantic drift;(2) the proposed approach produces outstanding results on the benchmark datasets. The code is available at https://github.com/hhhhzs666/KSIHER.
基金supported by the National Natural Science Foundation of China under Grant Nos.62272277,U23A20312,and 62072284the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No.2022YFB3303200the Natural Science Foundation of Shandong Province of China under Grant No.ZR2020MF036.
文摘In the domain of point cloud registration,the coarse-to-fine feature matching paradigm has received significant attention due to its impressive performance.This paradigm involves a two-step process:first,the extraction of multilevel features,and subsequently,the propagation of correspondences from coarse to fine levels.However,this approach faces two notable limitations.Firstly,the use of the Dual Softmax operation may promote one-to-one correspondences between superpoints,inadvertently excluding valuable correspondences.Secondly,it is crucial to closely examine the overlapping areas between point clouds,as only correspondences within these regions decisively determine the actual transformation.Considering these issues,we propose OAAFormer to enhance correspondence quality.On the one hand,we introduce a soft matching mechanism to facilitate the propagation of potentially valuable correspondences from coarse to fine levels.On the other hand,we integrate an overlapping region detection module to minimize mismatches to the greatest extent possible.Furthermore,we introduce a region-wise attention module with linear complexity during the fine-level matching phase,designed to enhance the discriminative capabilities of the extracted features.Tests on the challenging 3DLoMatch benchmark demonstrate that our approach leads to a substantial increase of about 7%in the inlier ratio,as well as an enhancement of 2%-4%in registration recall.Finally,to accelerate the prediction process,we replace the Conventional Random Sample Consensus(RANSAC)algorithm with the selection of a limited yet representative set of high-confidence correspondences,resulting in a 100 times speedup while still maintaining comparable registration performance.
文摘Humongous amounts of data bring various challenges to face image retrieval. This paper proposes an efficient method to solve those problems. Firstly,we use accurate facial landmark locations as shape features. Secondly, we utilise shape priors to provide discriminative texture features for convolutional neural networks. These shape and texture features are fused to make the learned representation more robust.Finally, in order to increase efficiency, a coarse-tofine search mechanism is exploited to efficiently find similar objects. Extensive experiments on the CASIAWeb Face, MSRA-CFW, and LFW datasets illustrate the superiority of our method.
文摘The images from a monocular camera can be processed to detect depth information regarding obstacles in the blind spot area captured by the side-view camera of a vehicle.The depth information is given as a classification result“near”or“far”when two blocks in the image are compared with respect to their distances and the depth information can be used for the purpose of blind spot area detection.In this paper,the proposed depth information is inferred from a combination of blur cues and texture cues.The depth information is estimated by comparing the features of two image blocks selected within a single image.A preliminary experiment demonstrates that a convolutional neural network(CNN)model trained by deep learning with a set of relatively ideal images achieves good accuracy.The same CNN model is applied to distinguish near and far obstacles according to a specified threshold in the vehicle blind spot area,and the promising results are obtained.The proposed method uses a standard blind spot camera and can improve safety without other additional sensing devices.Thus,the proposed approach has the potential to be applied in vehicular applications for the detection of objects in the driver’s blind spot.
基金Supported by the National Natural Science Foundation of China (Grant No.60675017)National Basic Research Program of China (Grant No.2006CB303103)
文摘A shot presents a contiguous action recorded by an uninterrupted camera operation and frames within a shot keep spatio-temporal coherence. Segmenting a serial video stream file into meaningful shots is the first pass for the task of video analysis, content-based video understanding. In this paper, a novel scheme based on improved two-dimensional entropy is proposed to complete the partition of video shots. Firstly, shot transition candidates are detected using a two-pass algorithm: a coarse searching pass and a fine searching pass. Secondly, with the character of two-dimensional entropy of the image, correctly detected transition candidates are further classified into different transition types whereas those falsely detected shot breaks are distinguished and removed. Finally, the boundary of gradual transition can be precisely located by merging the characters of two-dimensional entropy of the image into the gradual transition. A large number of video sequences are used to test our system performance and promising results are obtained.