Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones...Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.展开更多
Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s...Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s naive representations or the augmentations under the instance’s semantic representations.To tackle this problem,we offer a unique insight into data augmentations and propose a novel data-augmentation-based semi-supervised learning method,called Attentive Neighborhood Feature Aug-mentation(ANFA).The motivation of our method lies in the observation that the relationship between the given feature and its neighborhood may contribute to constructing more reliable transformations for the data,and further facilitating the classifier to distinguish the ambiguous features from the low-dense regions.Specially,we first project the labeled and unlabeled data points into an embedding space and then construct a neighbor graph that serves as a similarity measure based on the similar representations in the embedding space.Then,we employ an attention mechanism to transform the target features into augmented ones based on the neighbor graph.Finally,we formulate a novel semi-supervised loss by encouraging the predictions of the interpolations of augmented features to be consistent with the corresponding interpolations of the predictions of the target features.We carried out exper-iments on SVHN and CIFAR-10 benchmark datasets and the experimental results demonstrate that our method outperforms the state-of-the-art methods when the number of labeled examples is limited.展开更多
Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,whi...Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,which interfere with the crack detection performance.Till to the present,there still lacks efficient algorithm models and training datasets to deal with the interference brought by the shadows.To fill in the gap,we made several contributions as follows.First,we proposed a new pavement shadow and crack dataset,which contains a variety of shadow and pavement pixel size combinations.It also covers all common cracks(linear cracks and network cracks),placing higher demands on crack detection methods.Second,we designed a two-step shadow-removal-oriented crack detection approach:SROCD,which improves the performance of the algorithm by first removing the shadow and then detecting it.In addition to shadows,the method can cope with other noise disturbances.Third,we explored the mechanism of how shadows affect crack detection.Based on this mechanism,we propose a data augmentation method based on the difference in brightness values,which can adapt to brightness changes caused by seasonal and weather changes.Finally,we introduced a residual feature augmentation algorithm to detect small cracks that can predict sudden disasters,and the algorithm improves the performance of the model overall.We compare our method with the state-of-the-art methods on existing pavement crack datasets and the shadow-crack dataset,and the experimental results demonstrate the superiority of our method.展开更多
Derivative and volatility attributes can be usefully calculated from recorded gamma ray(GR)data to enhance lithofacies classification in wellbores penetrating multiple lithologies.Such attributes extract information a...Derivative and volatility attributes can be usefully calculated from recorded gamma ray(GR)data to enhance lithofacies classification in wellbores penetrating multiple lithologies.Such attributes extract information about the log curve shape that cannot be readily discerned from the recorded well log data.A logged wellbore section for which 8911 data records are available for the three recorded logs(GR,sonic(DT)and bulk density(PB))is evaluated.That section demonstrates the value of the GR attributes for machine learning(ML)lithofacies predictions.Five feature selection configurations are considered.The 9-var configuration including GR,DT,PB and six GR attributes,and the 7-var configuration of GR and the six GR attributes,provide the most accurate and reproducible lithofacies predictions.The other three feature configurations evaluated do not include the GR attributes but just one to three of the recorded log features.The results of seven ML models and two regression models reveal that K-nearest neighbor(KNN),random forest(RF)and extreme gradient boosting(XGB)are the best performing models.They generate between 14 and 23 misclassification from 8911 data records for the 9-var model.Multi-layer perceptron(MLP)and support vector classification(SVC)do not perform well with the 7-var model which lacks the PB feature displaying the highest correlation with facies class.Annotated confusion matrices reveal that KNN,RF and XGB models can effectively distinguish all facies classes for the 9-var and 7-var configurations(that includes the GR attributes),whereas none of the models can achieve that outcome for the 3-var configuration(that excludes the GR attributes).Accurately distinguishing lithofacies using well-log data in sedimentary sections is an important objective in applied geoscience.The straightforward,GR-attribute method proposed works to improve confidence in ML-lithofacies classifications based on limited recorded well-log data.展开更多
In multi-dimensional classification(MDC), the semantics of objects are characterized by multiple class spaces from different dimensions. Most MDC approaches try to explicitly model the dependencies among class spaces ...In multi-dimensional classification(MDC), the semantics of objects are characterized by multiple class spaces from different dimensions. Most MDC approaches try to explicitly model the dependencies among class spaces in output space. In contrast, the recently proposed feature augmentation strategy, which aims at manipulating feature space, has also been shown to be an effective solution for MDC. However, existing feature augmentation approaches only focus on designing holistic augmented features to be appended with the original features, while better generalization performance could be achieved by exploiting multiple kinds of augmented features.In this paper, we propose the selective feature augmentation strategy that focuses on synergizing multiple kinds of augmented features.Specifically, by assuming that only part of the augmented features is pertinent and useful for each dimension′s model induction, we derive a classification model which can fully utilize the original features while conduct feature selection for the augmented features. To validate the effectiveness of the proposed strategy, we generate three kinds of simple augmented features based on standard k NN, weighted k NN, and maximum margin techniques, respectively. Comparative studies show that the proposed strategy achieves superior performance against both state-of-the-art MDC approaches and its degenerated versions with either kind of augmented features.展开更多
Coal-gangue object detection has attracted substantial attention because it is the core of realizing vision-based intelligent and green coal separation. However, most existing studies have been focused on laboratory d...Coal-gangue object detection has attracted substantial attention because it is the core of realizing vision-based intelligent and green coal separation. However, most existing studies have been focused on laboratory datasets and prioritized model lightweight. This makes the coal-gangue object detection challenging to adapt to the complex and harsh scenes of real production environments. Therefore, our project collected and labeled image datasets of coal and gangue under real production conditions from a coal preparation plant. We then designed a one-stage object model, named STATNet, following the “backbone-neck-head” architecture with the aim of enhancing the detection accuracy under industrial coal preparation scenarios. The proposed model utilizes Swin Transformer as backbone module to extract multi-scale features, improved path augmentation feature pyramid network (iPAFPN) as neck module to enrich feature fusion, and task-aligned head (TAH) as head module to mitigate conflicts and misalignments between classification and localization tasks. Experimental results on a real-world industrial dataset demonstrate that the proposed STATNet model achieves an impressive AP50 of 89.27 %, significantly surpassing several state-of-the-art baseline models by 2.02 % to 5.58 %. Additionally, it exhibits stronger robustness in resisting image corruption and perturbation. These findings demonstrate its promising prospects in practical coal and gangue separation applications.展开更多
基金supported by the National Natural Science Foundation of China(Nos.62276204 and 62203343)the Fundamental Research Funds for the Central Universities(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470).
文摘Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.
基金supported by the National Natural Science Foundation of China (Nos.62072127,62002076,61906049)Natural Science Foundation of Guangdong Province (Nos.2023A1515011774,2020A1515010423)+4 种基金Project 6142111180404 supported by CNKLSTISS,Science and Technology Program of Guangzhou,China (No.202002030131)Guangdong basic and applied basic research fund joint fund Youth Fund (No.2019A1515110213)Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No.MJUKF-IPIC202101)Natural Science Foundation of Guangdong Province No.2020A1515010423)Scientific research project for Guangzhou University (No.RP2022003).
文摘Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s naive representations or the augmentations under the instance’s semantic representations.To tackle this problem,we offer a unique insight into data augmentations and propose a novel data-augmentation-based semi-supervised learning method,called Attentive Neighborhood Feature Aug-mentation(ANFA).The motivation of our method lies in the observation that the relationship between the given feature and its neighborhood may contribute to constructing more reliable transformations for the data,and further facilitating the classifier to distinguish the ambiguous features from the low-dense regions.Specially,we first project the labeled and unlabeled data points into an embedding space and then construct a neighbor graph that serves as a similarity measure based on the similar representations in the embedding space.Then,we employ an attention mechanism to transform the target features into augmented ones based on the neighbor graph.Finally,we formulate a novel semi-supervised loss by encouraging the predictions of the interpolations of augmented features to be consistent with the corresponding interpolations of the predictions of the target features.We carried out exper-iments on SVHN and CIFAR-10 benchmark datasets and the experimental results demonstrate that our method outperforms the state-of-the-art methods when the number of labeled examples is limited.
基金supported in part by the 14th Five-Year Project of Ministry of Science and Technology of China(2021YFD2000304)Fundamental Research Funds for the Central Universities(531118010509)Natural Science Foundation of Hunan Province,China(2021JJ40114)。
文摘Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,which interfere with the crack detection performance.Till to the present,there still lacks efficient algorithm models and training datasets to deal with the interference brought by the shadows.To fill in the gap,we made several contributions as follows.First,we proposed a new pavement shadow and crack dataset,which contains a variety of shadow and pavement pixel size combinations.It also covers all common cracks(linear cracks and network cracks),placing higher demands on crack detection methods.Second,we designed a two-step shadow-removal-oriented crack detection approach:SROCD,which improves the performance of the algorithm by first removing the shadow and then detecting it.In addition to shadows,the method can cope with other noise disturbances.Third,we explored the mechanism of how shadows affect crack detection.Based on this mechanism,we propose a data augmentation method based on the difference in brightness values,which can adapt to brightness changes caused by seasonal and weather changes.Finally,we introduced a residual feature augmentation algorithm to detect small cracks that can predict sudden disasters,and the algorithm improves the performance of the model overall.We compare our method with the state-of-the-art methods on existing pavement crack datasets and the shadow-crack dataset,and the experimental results demonstrate the superiority of our method.
文摘Derivative and volatility attributes can be usefully calculated from recorded gamma ray(GR)data to enhance lithofacies classification in wellbores penetrating multiple lithologies.Such attributes extract information about the log curve shape that cannot be readily discerned from the recorded well log data.A logged wellbore section for which 8911 data records are available for the three recorded logs(GR,sonic(DT)and bulk density(PB))is evaluated.That section demonstrates the value of the GR attributes for machine learning(ML)lithofacies predictions.Five feature selection configurations are considered.The 9-var configuration including GR,DT,PB and six GR attributes,and the 7-var configuration of GR and the six GR attributes,provide the most accurate and reproducible lithofacies predictions.The other three feature configurations evaluated do not include the GR attributes but just one to three of the recorded log features.The results of seven ML models and two regression models reveal that K-nearest neighbor(KNN),random forest(RF)and extreme gradient boosting(XGB)are the best performing models.They generate between 14 and 23 misclassification from 8911 data records for the 9-var model.Multi-layer perceptron(MLP)and support vector classification(SVC)do not perform well with the 7-var model which lacks the PB feature displaying the highest correlation with facies class.Annotated confusion matrices reveal that KNN,RF and XGB models can effectively distinguish all facies classes for the 9-var and 7-var configurations(that includes the GR attributes),whereas none of the models can achieve that outcome for the 3-var configuration(that excludes the GR attributes).Accurately distinguishing lithofacies using well-log data in sedimentary sections is an important objective in applied geoscience.The straightforward,GR-attribute method proposed works to improve confidence in ML-lithofacies classifications based on limited recorded well-log data.
基金supported by National Science Foundation of China (No. 62176055)China University S&T Innovation Plan Guided by the Ministry of Education。
文摘In multi-dimensional classification(MDC), the semantics of objects are characterized by multiple class spaces from different dimensions. Most MDC approaches try to explicitly model the dependencies among class spaces in output space. In contrast, the recently proposed feature augmentation strategy, which aims at manipulating feature space, has also been shown to be an effective solution for MDC. However, existing feature augmentation approaches only focus on designing holistic augmented features to be appended with the original features, while better generalization performance could be achieved by exploiting multiple kinds of augmented features.In this paper, we propose the selective feature augmentation strategy that focuses on synergizing multiple kinds of augmented features.Specifically, by assuming that only part of the augmented features is pertinent and useful for each dimension′s model induction, we derive a classification model which can fully utilize the original features while conduct feature selection for the augmented features. To validate the effectiveness of the proposed strategy, we generate three kinds of simple augmented features based on standard k NN, weighted k NN, and maximum margin techniques, respectively. Comparative studies show that the proposed strategy achieves superior performance against both state-of-the-art MDC approaches and its degenerated versions with either kind of augmented features.
基金funded by the Fundamental Research Funds for the Central Universities(No.2020ZDPY0214).
文摘Coal-gangue object detection has attracted substantial attention because it is the core of realizing vision-based intelligent and green coal separation. However, most existing studies have been focused on laboratory datasets and prioritized model lightweight. This makes the coal-gangue object detection challenging to adapt to the complex and harsh scenes of real production environments. Therefore, our project collected and labeled image datasets of coal and gangue under real production conditions from a coal preparation plant. We then designed a one-stage object model, named STATNet, following the “backbone-neck-head” architecture with the aim of enhancing the detection accuracy under industrial coal preparation scenarios. The proposed model utilizes Swin Transformer as backbone module to extract multi-scale features, improved path augmentation feature pyramid network (iPAFPN) as neck module to enrich feature fusion, and task-aligned head (TAH) as head module to mitigate conflicts and misalignments between classification and localization tasks. Experimental results on a real-world industrial dataset demonstrate that the proposed STATNet model achieves an impressive AP50 of 89.27 %, significantly surpassing several state-of-the-art baseline models by 2.02 % to 5.58 %. Additionally, it exhibits stronger robustness in resisting image corruption and perturbation. These findings demonstrate its promising prospects in practical coal and gangue separation applications.