Accurate prediction of landslide displacement is crucial for effective early warning of landslide disasters.While most existing prediction methods focus on time-series forecasting for individual monitoring points,ther...Accurate prediction of landslide displacement is crucial for effective early warning of landslide disasters.While most existing prediction methods focus on time-series forecasting for individual monitoring points,there is limited research on the spatiotemporal characteristics of landslide deformation.This paper proposes a novel Multi-Relation Spatiotemporal Graph Residual Network with Multi-Level Feature Attention(MFA-MRSTGRN)that effectively improves the prediction performance of landslide displacement through spatiotemporal fusion.This model integrates internal seepage factors as data feature enhancements with external triggering factors,allowing for accurate capture of the complex spatiotemporal characteristics of landslide displacement and the construction of a multi-source heterogeneous dataset.The MFA-MRSTGRN model incorporates dynamic graph theory and four key modules:multilevel feature attention,temporal-residual decomposition,spatial multi-relational graph convolution,and spatiotemporal fusion prediction.This comprehensive approach enables the efficient analyses of multi-source heterogeneous datasets,facilitating adaptive exploration of the evolving multi-relational,multi-dimensional spatiotemporal complexities in landslides.When applying this model to predict the displacement of the Liangshuijing landslide,we demonstrate that the MFA-MRSTGRN model surpasses traditional models,such as random forest(RF),long short-term memory(LSTM),and spatial temporal graph convolutional networks(ST-GCN)models in terms of various evaluation metrics including mean absolute error(MAE=1.27 mm),root mean square error(RMSE=1.49 mm),mean absolute percentage error(MAPE=0.026),and R-squared(R^(2)=0.88).Furthermore,feature ablation experiments indicate that incorporating internal seepage factors improves the predictive performance of landslide displacement models.This research provides an advanced and reliable method for landslide displacement prediction.展开更多
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships ...The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships between pixels requires further improvement.Previous methods face challenges in efficiently managing multi-scale fea-tures of different granularities from the encoder backbone,leaving room for improvement in their global representation and feature extraction capabilities.To address these challenges,we propose a novel Decoder with Multi-Head Feature Receptors(DMHFR),which receives multi-scale features from the encoder backbone and organizes them into three feature groups with different granularities:coarse,fine-grained,and full set.These groups are subsequently processed by Multi-Head Feature Receptors(MHFRs)after feature capture and modeling operations.MHFRs include two Three-Head Feature Receptors(THFRs)and one Four-Head Feature Receptor(FHFR).Each group of features is passed through these MHFRs and then fed into axial transformers,which help the model capture long-range dependencies within the features.The three MHFRs produce three distinct feature outputs.The output from the FHFR serves as auxiliary auxiliary features in the prediction head,and the prediction output and their losses will eventually be aggregated.Experimental results show that the Transformer using DMHFR outperforms 15 state of the arts(SOTA)methods on five public datasets.Specifically,it achieved significant improvements in mean DICE scores over the classic Parallel Reverse Attention Network(PraNet)method,with gains of 4.1%,2.2%,1.4%,8.9%,and 16.3%on the CVC-ClinicDB,Kvasir-SEG,CVC-T,CVC-ColonDB,and ETIS-LaribPolypDB datasets,respectively.展开更多
Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world appli...Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world applications.Few-shot object detection presents a new research idea that aims to localize and classify objects in images using only limited annotated examples.However,the inherent challenge in few-shot object detection lies in the insufficient sample diversity to fully characterize the sample feature distribution,which consequently impacts model performance.Inspired by contrastive learning principles,we propose an Implicit Feature Contrastive Learning(IFCL)module to address this limitation and augment feature diversity for more robust representational learning.This module generates augmented support sample features in a mixed feature space and implicitly contrasts them with query Region of Interest(RoI)features.This approach facilitates more comprehensive learning of both intra-class feature similarity and inter-class feature diversity,thereby enhancing the model’s object classification and localization capabilities.Extensive experiments on PASCAL VOC show that our method achieves a respective improvement of 3.2%,1.8%,and 2.3%on 10-shot of three Novel Sets compared to the baseline model FPD.展开更多
Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image.Local features extracted by convolutions,etc.,capture finegrained details such as edges and te...Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image.Local features extracted by convolutions,etc.,capture finegrained details such as edges and textures,while global features extracted by full connection layers,etc.,represent the overall structure and long-range relationships within the image.These features are crucial for accurate object detection,yet most existing methods focus on aggregating local and global features,often overlooking the importance of medium-range dependencies.To address this gap,we propose a novel full perception module(FPModule),a simple yet effective feature extraction module designed to simultaneously capture local details,medium-range dependencies,and long-range dependencies.Building on this,we construct a full perception head(FP-Head)by cascading multiple FP-Modules,enabling the prediction layer to leverage the most informative features.Experimental results in the MS COCO dataset demonstrate that our approach significantly enhances object recognition and localization,achieving 2.7−5.7 APval gains when integrated into standard object detectors.Notably,the FP-Module is a universal solution that can be seamlessly incorporated into existing detectors to boost performance.The code will be released at https://github.com/Idcogroup/FP-Head.展开更多
Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstl...Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstly,a multiplexed aggregated feature extraction network is proposed using residual bottleneck block(RES-Bottleneck)and middle partial-convolution(MP-Conv)to capture multi-scale spatial features and enhance focus on disease features for better differentiation between disease targets and background information.Secondly,a lightweight feature fusion network is designed using scale-fuse concatenation(SF-Cat)and triple-scale sequence feature fusion(TSSF)module to merge multi-scale feature maps comprehensively.Depthwise convolution(DWConv)and GhostNet lighten the network,while the cross stage partial bottleneck with 3 convolutions ghost-normalization attention module(C3-GN)reduces missed detections by suppressing irrelevant background information.Finally,soft non-maximum suppression(Soft-NMS)is used in the post-processing stage to improve the problem of misdetection of dense disease sites.The results show that the MSL-Net improves mean average precision at intersection over union of 0.5(mAP@0.5)by 2.0%over the baseline you only look once version 5s(YOLOv5s)and reduces parameters by 44%,reducing computation by 27%,outperforming other state-of-the-art(SOTA)models overall.This method also shows excellent performance compared to the latest research.展开更多
Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for ...Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.展开更多
Stance detection is the task of attitude identification toward a standpoint.Previous work of stance detection has focused on feature extraction but ignored the fact that irrelevant features exist as noise during highe...Stance detection is the task of attitude identification toward a standpoint.Previous work of stance detection has focused on feature extraction but ignored the fact that irrelevant features exist as noise during higher-level abstracting.Moreover,because the target is not always mentioned in the text,most methods have ignored target information.In order to solve these problems,we propose a neural network ensemble method that combines the timing dependence bases on long short-term memory(LSTM)and the excellent extracting performance of convolutional neural networks(CNNs).The method can obtain multi-level features that consider both local and global features.We also introduce attention mechanisms to magnify target information-related features.Furthermore,we employ sparse coding to remove noise to obtain characteristic features.Performance was improved by using sparse coding on the basis of attention employment and feature extraction.We evaluate our approach on the SemEval-2016Task 6-A public dataset,achieving a performance that exceeds the benchmark and those of participating teams.展开更多
The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orient...The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orientation detection.Political articles(especially in the Arab world)are different from other articles due to their subjectivity,in which the author’s beliefs and political affiliation might have a significant influence on a political article.With categories representing the main political ideologies,this problem may be thought of as a subset of the text categorization(classification).In general,the performance of machine learning models for text classification is sensitive to hyperparameter settings.Furthermore,the feature vector used to represent a document must capture,to some extent,the complex semantics of natural language.To this end,this paper presents an intelligent system to detect political Arabic article orientation that adapts the categorical boosting(CatBoost)method combined with a multi-level feature concept.Extracting features at multiple levels can enhance the model’s ability to discriminate between different classes or patterns.Each level may capture different aspects of the input data,contributing to a more comprehensive representation.CatBoost,a robust and efficient gradient-boosting algorithm,is utilized to effectively learn and predict the complex relationships between these features and the political orientation labels associated with the articles.A dataset of political Arabic texts collected from diverse sources,including postings and articles,is used to assess the suggested technique.Conservative,reform,and revolutionary are the three subcategories of these opinions.The results of this study demonstrate that compared to other frequently used machine learning models for text classification,the CatBoost method using multi-level features performs better with an accuracy of 98.14%.展开更多
Respiratory sound classification is significant in clinical diagnosis. However,existing convolutional neural network(CNN)-based methods face challenges arising from limited training data and insufficient capability in...Respiratory sound classification is significant in clinical diagnosis. However,existing convolutional neural network(CNN)-based methods face challenges arising from limited training data and insufficient capability in modeling long-term temporal dependencies in respiratory sounds. To address these issues,a bidirectional messagepassing feature aggregation network(BMFAN) that integrates CNN and transformer architectures is proposed,enabling effective fusion of local spectral characteristics and global temporal dependencies. To alleviate feature misalignment between the two modules,1 × 1 convolution,average pooling,and interpolation operations are employed. In addition,a spectrum encoder is designed to enhance time-frequency feature representations,and FilterAugment,a frequency band weighting strategy, is introduced to optimize spectral data distribution by emphasizing diagnostically relevant frequency regions. Experimental results on the International Conference on Biomedical and Health Informatics(ICBHI) 2017 dataset demonstrate that BMFAN achieves competitive performance in both two-class and four-class respiratory sound classification tasks.展开更多
Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to est...Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.展开更多
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges...Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.展开更多
As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus ...As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.展开更多
Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental mea...Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental measurements,but these are often limited by the observation conditions,such as the number of configured sensors.Therefore,the resulting linear velocity profiles usually exhibit limitations in reproducing the temporal-varied and nonlinear behavior during the debris flow process.In this study,we present a novel approach to explore the debris flow velocity profile in detail upon our previous 3D-HBPSPH numerical model,i.e.,the three-dimensional Smoothed Particle Hydrodynamic model incorporating the Herschel-Bulkley-Papanastasiou rheology.Specifically,we propose a stratification aggregation algorithm for interpreting the details of SPH particles,which enables the recording of temporal velocities of debris flow at different mud depths.To analyze the velocity profile,we introduce a logarithmic-based nonlinear model with two key parameters,that a controlling the shape of velocity profile and b concerning its temporal evolution.We verify the proposed velocity profile and explore its sensitivity using 34 sets of velocity data from three individual flume experiments in previous literature.Our results demonstrate that the proposed temporalvaried nonlinear velocity profile outperforms the previous linear profiles.展开更多
Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providi...Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.展开更多
This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online ide...This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online identification method is a computer-involved approach for data collection,processing,and system identification,commonly used for adaptive control and prediction.This paper proposes a method for dynamically aggregating large-scale adjustable loads to support high proportions of new energy integration,aiming to study the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction methods.The experiment selected 300 central air conditioners as the research subject and analyzed their regulation characteristics,economic efficiency,and comfort.The experimental results show that as the adjustment time of the air conditioner increases from 5 minutes to 35 minutes,the stable adjustment quantity during the adjustment period decreases from 28.46 to 3.57,indicating that air conditioning loads can be controlled over a long period and have better adjustment effects in the short term.Overall,the experimental results of this paper demonstrate that analyzing the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction algorithms is effective.展开更多
As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quick...As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quickly is still challenging due to the method of extracting and processing handwriting features.In this paper,we propose an efficient system to identify writers through handwritten images,which integrates local and global features from similar handwritten images.The local features are modeled by effective aggregate processing,and global features are extracted through transfer learning.Specifically,the proposed system employs a pre-trained Residual Network to mine the relationship between large image sets and specific handwritten images,while the vector of locally aggregated descriptors with double power normalization is employed in aggregating local and global features.Moreover,handwritten image segmentation,preprocessing,enhancement,optimization of neural network architecture,and normalization for local and global features are exploited,significantly improving system performance.The proposed system is evaluated on Computer Vision Lab(CVL)datasets and the International Conference on Document Analysis and Recognition(ICDAR)2013 datasets.The results show that it represents good generalizability and achieves state-of-the-art performance.Furthermore,the system performs better when training complete handwriting patches with the normalization method.The experimental result indicates that it’s significant to segment handwriting reasonably while dealing with handwriting overlap,which reduces visual burstiness.展开更多
Accurate identification of fungal species is essential for effective diagnosis and treatment.Traditional microscopy-based methods are often subjective and time-consuming.Deep learning has emerged as a promising tool i...Accurate identification of fungal species is essential for effective diagnosis and treatment.Traditional microscopy-based methods are often subjective and time-consuming.Deep learning has emerged as a promising tool in this domain.However,existing deep learning models often struggle to generalise in the presence of class imbalance and subtle morphological differences,which are common in fungal image datasets.This study proposes MASA-Net,a deep learning framework that combines a fine-tuned DenseNet201 backbone with a multi-aspect channel-spatial attention(MASA)module.The attention mechanism refines spatial and channel-wise features by capturing multi-scale spatial patterns and adaptively emphasising informative channels.This enhances the network's ability to focus on diagnostically relevant fungal structures while suppressing irrelevant features.The MASA-Net is evaluated on the DeFungi dataset and demonstrates superior performance in terms of accuracy,precision,recall and F1-score.It also outperforms established attention mechanisms such as squeeze-andexcitation networks(SE)and convolutional block attention module(CBAM)under identical conditions.These results highlight MASA-Net's robustness and effectiveness in addressing class imbalance and structural variability,offering a reliable solution for automated fungal species identification.展开更多
基金the funding support from the National Natural Science Foundation of China(Grant No.52308340)Chongqing Talent Innovation and Entrepreneurship Demonstration Team Project(Grant No.cstc2024ycjh-bgzxm0012)the Science and Technology Projects supported by China Coal Technology and Engineering Chongqing Design and Research Institute(Group)Co.,Ltd.(Grant No.H20230317).
文摘Accurate prediction of landslide displacement is crucial for effective early warning of landslide disasters.While most existing prediction methods focus on time-series forecasting for individual monitoring points,there is limited research on the spatiotemporal characteristics of landslide deformation.This paper proposes a novel Multi-Relation Spatiotemporal Graph Residual Network with Multi-Level Feature Attention(MFA-MRSTGRN)that effectively improves the prediction performance of landslide displacement through spatiotemporal fusion.This model integrates internal seepage factors as data feature enhancements with external triggering factors,allowing for accurate capture of the complex spatiotemporal characteristics of landslide displacement and the construction of a multi-source heterogeneous dataset.The MFA-MRSTGRN model incorporates dynamic graph theory and four key modules:multilevel feature attention,temporal-residual decomposition,spatial multi-relational graph convolution,and spatiotemporal fusion prediction.This comprehensive approach enables the efficient analyses of multi-source heterogeneous datasets,facilitating adaptive exploration of the evolving multi-relational,multi-dimensional spatiotemporal complexities in landslides.When applying this model to predict the displacement of the Liangshuijing landslide,we demonstrate that the MFA-MRSTGRN model surpasses traditional models,such as random forest(RF),long short-term memory(LSTM),and spatial temporal graph convolutional networks(ST-GCN)models in terms of various evaluation metrics including mean absolute error(MAE=1.27 mm),root mean square error(RMSE=1.49 mm),mean absolute percentage error(MAPE=0.026),and R-squared(R^(2)=0.88).Furthermore,feature ablation experiments indicate that incorporating internal seepage factors improves the predictive performance of landslide displacement models.This research provides an advanced and reliable method for landslide displacement prediction.
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
基金supported by Xiamen Medical and Health Guidance Project in 2021(No.3502Z20214ZD1070)supported by a grant from Guangxi Key Laboratory of Machine Vision and Intelligent Control,China(No.2023B02).
文摘The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships between pixels requires further improvement.Previous methods face challenges in efficiently managing multi-scale fea-tures of different granularities from the encoder backbone,leaving room for improvement in their global representation and feature extraction capabilities.To address these challenges,we propose a novel Decoder with Multi-Head Feature Receptors(DMHFR),which receives multi-scale features from the encoder backbone and organizes them into three feature groups with different granularities:coarse,fine-grained,and full set.These groups are subsequently processed by Multi-Head Feature Receptors(MHFRs)after feature capture and modeling operations.MHFRs include two Three-Head Feature Receptors(THFRs)and one Four-Head Feature Receptor(FHFR).Each group of features is passed through these MHFRs and then fed into axial transformers,which help the model capture long-range dependencies within the features.The three MHFRs produce three distinct feature outputs.The output from the FHFR serves as auxiliary auxiliary features in the prediction head,and the prediction output and their losses will eventually be aggregated.Experimental results show that the Transformer using DMHFR outperforms 15 state of the arts(SOTA)methods on five public datasets.Specifically,it achieved significant improvements in mean DICE scores over the classic Parallel Reverse Attention Network(PraNet)method,with gains of 4.1%,2.2%,1.4%,8.9%,and 16.3%on the CVC-ClinicDB,Kvasir-SEG,CVC-T,CVC-ColonDB,and ETIS-LaribPolypDB datasets,respectively.
基金funded by the China Chongqing Municipal Science and Technology Bureau,grant numbers CSTB2024TIAD-CYKJCXX0009,CSTB2024NSCQ-LZX0043,CSTB2022NSCQ-MSX0288Chongqing Municipal Commission of Housing and Urban-Rural Development,grant number CKZ2024-87+3 种基金the Chongqing University of Technology Graduate Education High-Quality Development Project,grant number gzlsz202401the Chongqing University of Technology—Chongqing LINGLUE Technology Co.,Ltd.Electronic Information(Artificial Intelligence)Graduate Joint Training Basethe Postgraduate Education and Teaching Reform Research Project in Chongqing,grant number yjg213116the Chongqing University of Technology-CISDI Chongqing Information Technology Co.,Ltd.Computer Technology Graduate Joint Training Base.
文摘Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world applications.Few-shot object detection presents a new research idea that aims to localize and classify objects in images using only limited annotated examples.However,the inherent challenge in few-shot object detection lies in the insufficient sample diversity to fully characterize the sample feature distribution,which consequently impacts model performance.Inspired by contrastive learning principles,we propose an Implicit Feature Contrastive Learning(IFCL)module to address this limitation and augment feature diversity for more robust representational learning.This module generates augmented support sample features in a mixed feature space and implicitly contrasts them with query Region of Interest(RoI)features.This approach facilitates more comprehensive learning of both intra-class feature similarity and inter-class feature diversity,thereby enhancing the model’s object classification and localization capabilities.Extensive experiments on PASCAL VOC show that our method achieves a respective improvement of 3.2%,1.8%,and 2.3%on 10-shot of three Novel Sets compared to the baseline model FPD.
基金supported by the National Natural Science Foundation of China(62371350,62171324,62471338,U1903214).
文摘Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image.Local features extracted by convolutions,etc.,capture finegrained details such as edges and textures,while global features extracted by full connection layers,etc.,represent the overall structure and long-range relationships within the image.These features are crucial for accurate object detection,yet most existing methods focus on aggregating local and global features,often overlooking the importance of medium-range dependencies.To address this gap,we propose a novel full perception module(FPModule),a simple yet effective feature extraction module designed to simultaneously capture local details,medium-range dependencies,and long-range dependencies.Building on this,we construct a full perception head(FP-Head)by cascading multiple FP-Modules,enabling the prediction layer to leverage the most informative features.Experimental results in the MS COCO dataset demonstrate that our approach significantly enhances object recognition and localization,achieving 2.7−5.7 APval gains when integrated into standard object detectors.Notably,the FP-Module is a universal solution that can be seamlessly incorporated into existing detectors to boost performance.The code will be released at https://github.com/Idcogroup/FP-Head.
文摘Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstly,a multiplexed aggregated feature extraction network is proposed using residual bottleneck block(RES-Bottleneck)and middle partial-convolution(MP-Conv)to capture multi-scale spatial features and enhance focus on disease features for better differentiation between disease targets and background information.Secondly,a lightweight feature fusion network is designed using scale-fuse concatenation(SF-Cat)and triple-scale sequence feature fusion(TSSF)module to merge multi-scale feature maps comprehensively.Depthwise convolution(DWConv)and GhostNet lighten the network,while the cross stage partial bottleneck with 3 convolutions ghost-normalization attention module(C3-GN)reduces missed detections by suppressing irrelevant background information.Finally,soft non-maximum suppression(Soft-NMS)is used in the post-processing stage to improve the problem of misdetection of dense disease sites.The results show that the MSL-Net improves mean average precision at intersection over union of 0.5(mAP@0.5)by 2.0%over the baseline you only look once version 5s(YOLOv5s)and reduces parameters by 44%,reducing computation by 27%,outperforming other state-of-the-art(SOTA)models overall.This method also shows excellent performance compared to the latest research.
基金suported by the Fundamental Research Grant Scheme(FRGS)of Universiti Sains Malaysia,Research Number:FRGS/1/2024/ICT02/USM/02/1.
文摘Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.
基金This work is supported by the Fundamental Research Funds for the Central Universities(Grant No.2572019BH03).
文摘Stance detection is the task of attitude identification toward a standpoint.Previous work of stance detection has focused on feature extraction but ignored the fact that irrelevant features exist as noise during higher-level abstracting.Moreover,because the target is not always mentioned in the text,most methods have ignored target information.In order to solve these problems,we propose a neural network ensemble method that combines the timing dependence bases on long short-term memory(LSTM)and the excellent extracting performance of convolutional neural networks(CNNs).The method can obtain multi-level features that consider both local and global features.We also introduce attention mechanisms to magnify target information-related features.Furthermore,we employ sparse coding to remove noise to obtain characteristic features.Performance was improved by using sparse coding on the basis of attention employment and feature extraction.We evaluate our approach on the SemEval-2016Task 6-A public dataset,achieving a performance that exceeds the benchmark and those of participating teams.
文摘The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orientation detection.Political articles(especially in the Arab world)are different from other articles due to their subjectivity,in which the author’s beliefs and political affiliation might have a significant influence on a political article.With categories representing the main political ideologies,this problem may be thought of as a subset of the text categorization(classification).In general,the performance of machine learning models for text classification is sensitive to hyperparameter settings.Furthermore,the feature vector used to represent a document must capture,to some extent,the complex semantics of natural language.To this end,this paper presents an intelligent system to detect political Arabic article orientation that adapts the categorical boosting(CatBoost)method combined with a multi-level feature concept.Extracting features at multiple levels can enhance the model’s ability to discriminate between different classes or patterns.Each level may capture different aspects of the input data,contributing to a more comprehensive representation.CatBoost,a robust and efficient gradient-boosting algorithm,is utilized to effectively learn and predict the complex relationships between these features and the political orientation labels associated with the articles.A dataset of political Arabic texts collected from diverse sources,including postings and articles,is used to assess the suggested technique.Conservative,reform,and revolutionary are the three subcategories of these opinions.The results of this study demonstrate that compared to other frequently used machine learning models for text classification,the CatBoost method using multi-level features performs better with an accuracy of 98.14%.
基金supported by the State Grid Corporation Science and Technology Project Funding (5700-202422243A-1-1-ZN)
文摘Respiratory sound classification is significant in clinical diagnosis. However,existing convolutional neural network(CNN)-based methods face challenges arising from limited training data and insufficient capability in modeling long-term temporal dependencies in respiratory sounds. To address these issues,a bidirectional messagepassing feature aggregation network(BMFAN) that integrates CNN and transformer architectures is proposed,enabling effective fusion of local spectral characteristics and global temporal dependencies. To alleviate feature misalignment between the two modules,1 × 1 convolution,average pooling,and interpolation operations are employed. In addition,a spectrum encoder is designed to enhance time-frequency feature representations,and FilterAugment,a frequency band weighting strategy, is introduced to optimize spectral data distribution by emphasizing diagnostically relevant frequency regions. Experimental results on the International Conference on Biomedical and Health Informatics(ICBHI) 2017 dataset demonstrate that BMFAN achieves competitive performance in both two-class and four-class respiratory sound classification tasks.
基金supported in part by the Nationa Natural Science Foundation of China (61876011)the National Key Research and Development Program of China (2022YFB4703700)+1 种基金the Key Research and Development Program 2020 of Guangzhou (202007050002)the Key-Area Research and Development Program of Guangdong Province (2020B090921003)。
文摘Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.
基金Basic and Advanced Research Projects of CSTC,Grant/Award Number:cstc2019jcyj-zdxmX0008Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Numbers:KJQN202100634,KJZDK201900605National Natural Science Foundation of China,Grant/Award Number:62006065。
文摘Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.
基金This work was supported in part by the National Natural Science Foundation of China(Nos.62072074,62076054,62027827,61902054)the Frontier Science and Technology Innovation Projects of National Key R&D Program(No.2019QY1405)+2 种基金the Sichuan Science and Technology Innovation Platform and Talent Plan(No.2020JDJQ0020)the Sichuan Science and Technology Support Plan(No.2020YFSY0010)the Natural Science Foundation of Guangdong Province(No.2018A030313354).
文摘As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.
基金supported by the National Natural Science Foundation of China(Grant No.52078493)the Natural Science Foundation of Hunan Province(Grant No.2022JJ30700)+2 种基金the Natural Science Foundation for Excellent Young Scholars of Hunan(Grant No.2021JJ20057)the Science and Technology Plan Project of Changsha(Grant No.kq2305006)the Innovation Driven Program of Central South University(Grant No.2023CXQD033).
文摘Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental measurements,but these are often limited by the observation conditions,such as the number of configured sensors.Therefore,the resulting linear velocity profiles usually exhibit limitations in reproducing the temporal-varied and nonlinear behavior during the debris flow process.In this study,we present a novel approach to explore the debris flow velocity profile in detail upon our previous 3D-HBPSPH numerical model,i.e.,the three-dimensional Smoothed Particle Hydrodynamic model incorporating the Herschel-Bulkley-Papanastasiou rheology.Specifically,we propose a stratification aggregation algorithm for interpreting the details of SPH particles,which enables the recording of temporal velocities of debris flow at different mud depths.To analyze the velocity profile,we introduce a logarithmic-based nonlinear model with two key parameters,that a controlling the shape of velocity profile and b concerning its temporal evolution.We verify the proposed velocity profile and explore its sensitivity using 34 sets of velocity data from three individual flume experiments in previous literature.Our results demonstrate that the proposed temporalvaried nonlinear velocity profile outperforms the previous linear profiles.
基金funded by(i)Natural Science Foundation China(NSFC)under Grant Nos.61402397,61263043,61562093 and 61663046(ii)Open Foundation of Key Laboratory in Software Engineering of Yunnan Province:No.2020SE304.(iii)Practical Innovation Project of Yunnan University,Project Nos.2021z34,2021y128 and 2021y129.
文摘Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.
基金supported by the State Grid Science&Technology Project(5100-202114296A-0-0-00).
文摘This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online identification method is a computer-involved approach for data collection,processing,and system identification,commonly used for adaptive control and prediction.This paper proposes a method for dynamically aggregating large-scale adjustable loads to support high proportions of new energy integration,aiming to study the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction methods.The experiment selected 300 central air conditioners as the research subject and analyzed their regulation characteristics,economic efficiency,and comfort.The experimental results show that as the adjustment time of the air conditioner increases from 5 minutes to 35 minutes,the stable adjustment quantity during the adjustment period decreases from 28.46 to 3.57,indicating that air conditioning loads can be controlled over a long period and have better adjustment effects in the short term.Overall,the experimental results of this paper demonstrate that analyzing the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction algorithms is effective.
基金supported in part by the Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant KYCX 20_0758in part by the Science and Technology Research Project of Jiangsu Public Security Department under Grant 2020KX005+1 种基金in part by the General Project of Philosophy and Social Science Research in Colleges and Universities in Jiangsu Province under Grant 2022SJYB0473in part by“Cyberspace Security”Construction Project of Jiangsu Provincial Key Discipline during the“14th Five Year Plan”.
文摘As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quickly is still challenging due to the method of extracting and processing handwriting features.In this paper,we propose an efficient system to identify writers through handwritten images,which integrates local and global features from similar handwritten images.The local features are modeled by effective aggregate processing,and global features are extracted through transfer learning.Specifically,the proposed system employs a pre-trained Residual Network to mine the relationship between large image sets and specific handwritten images,while the vector of locally aggregated descriptors with double power normalization is employed in aggregating local and global features.Moreover,handwritten image segmentation,preprocessing,enhancement,optimization of neural network architecture,and normalization for local and global features are exploited,significantly improving system performance.The proposed system is evaluated on Computer Vision Lab(CVL)datasets and the International Conference on Document Analysis and Recognition(ICDAR)2013 datasets.The results show that it represents good generalizability and achieves state-of-the-art performance.Furthermore,the system performs better when training complete handwriting patches with the normalization method.The experimental result indicates that it’s significant to segment handwriting reasonably while dealing with handwriting overlap,which reduces visual burstiness.
文摘Accurate identification of fungal species is essential for effective diagnosis and treatment.Traditional microscopy-based methods are often subjective and time-consuming.Deep learning has emerged as a promising tool in this domain.However,existing deep learning models often struggle to generalise in the presence of class imbalance and subtle morphological differences,which are common in fungal image datasets.This study proposes MASA-Net,a deep learning framework that combines a fine-tuned DenseNet201 backbone with a multi-aspect channel-spatial attention(MASA)module.The attention mechanism refines spatial and channel-wise features by capturing multi-scale spatial patterns and adaptively emphasising informative channels.This enhances the network's ability to focus on diagnostically relevant fungal structures while suppressing irrelevant features.The MASA-Net is evaluated on the DeFungi dataset and demonstrates superior performance in terms of accuracy,precision,recall and F1-score.It also outperforms established attention mechanisms such as squeeze-andexcitation networks(SE)and convolutional block attention module(CBAM)under identical conditions.These results highlight MASA-Net's robustness and effectiveness in addressing class imbalance and structural variability,offering a reliable solution for automated fungal species identification.