The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships ...The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships between pixels requires further improvement.Previous methods face challenges in efficiently managing multi-scale fea-tures of different granularities from the encoder backbone,leaving room for improvement in their global representation and feature extraction capabilities.To address these challenges,we propose a novel Decoder with Multi-Head Feature Receptors(DMHFR),which receives multi-scale features from the encoder backbone and organizes them into three feature groups with different granularities:coarse,fine-grained,and full set.These groups are subsequently processed by Multi-Head Feature Receptors(MHFRs)after feature capture and modeling operations.MHFRs include two Three-Head Feature Receptors(THFRs)and one Four-Head Feature Receptor(FHFR).Each group of features is passed through these MHFRs and then fed into axial transformers,which help the model capture long-range dependencies within the features.The three MHFRs produce three distinct feature outputs.The output from the FHFR serves as auxiliary auxiliary features in the prediction head,and the prediction output and their losses will eventually be aggregated.Experimental results show that the Transformer using DMHFR outperforms 15 state of the arts(SOTA)methods on five public datasets.Specifically,it achieved significant improvements in mean DICE scores over the classic Parallel Reverse Attention Network(PraNet)method,with gains of 4.1%,2.2%,1.4%,8.9%,and 16.3%on the CVC-ClinicDB,Kvasir-SEG,CVC-T,CVC-ColonDB,and ETIS-LaribPolypDB datasets,respectively.展开更多
Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world appli...Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world applications.Few-shot object detection presents a new research idea that aims to localize and classify objects in images using only limited annotated examples.However,the inherent challenge in few-shot object detection lies in the insufficient sample diversity to fully characterize the sample feature distribution,which consequently impacts model performance.Inspired by contrastive learning principles,we propose an Implicit Feature Contrastive Learning(IFCL)module to address this limitation and augment feature diversity for more robust representational learning.This module generates augmented support sample features in a mixed feature space and implicitly contrasts them with query Region of Interest(RoI)features.This approach facilitates more comprehensive learning of both intra-class feature similarity and inter-class feature diversity,thereby enhancing the model’s object classification and localization capabilities.Extensive experiments on PASCAL VOC show that our method achieves a respective improvement of 3.2%,1.8%,and 2.3%on 10-shot of three Novel Sets compared to the baseline model FPD.展开更多
Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for ...Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.展开更多
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
The fingerspelling recognition by hand shape is an important step for developing a human-computer interaction system. A method of fingerspelling recognition by hand shape using HLAC (higher-order local auto-correlat...The fingerspelling recognition by hand shape is an important step for developing a human-computer interaction system. A method of fingerspelling recognition by hand shape using HLAC (higher-order local auto-correlation) features is proposed. Furthermore, in order to use HLAC features more effectively, the use of image processing techniques: reducing an image resolution, dividing an image, and image pre-processing techniques, is also proposed. The experimental results show that the proposed method is promising.展开更多
Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to est...Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.展开更多
Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental mea...Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental measurements,but these are often limited by the observation conditions,such as the number of configured sensors.Therefore,the resulting linear velocity profiles usually exhibit limitations in reproducing the temporal-varied and nonlinear behavior during the debris flow process.In this study,we present a novel approach to explore the debris flow velocity profile in detail upon our previous 3D-HBPSPH numerical model,i.e.,the three-dimensional Smoothed Particle Hydrodynamic model incorporating the Herschel-Bulkley-Papanastasiou rheology.Specifically,we propose a stratification aggregation algorithm for interpreting the details of SPH particles,which enables the recording of temporal velocities of debris flow at different mud depths.To analyze the velocity profile,we introduce a logarithmic-based nonlinear model with two key parameters,that a controlling the shape of velocity profile and b concerning its temporal evolution.We verify the proposed velocity profile and explore its sensitivity using 34 sets of velocity data from three individual flume experiments in previous literature.Our results demonstrate that the proposed temporalvaried nonlinear velocity profile outperforms the previous linear profiles.展开更多
This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online ide...This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online identification method is a computer-involved approach for data collection,processing,and system identification,commonly used for adaptive control and prediction.This paper proposes a method for dynamically aggregating large-scale adjustable loads to support high proportions of new energy integration,aiming to study the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction methods.The experiment selected 300 central air conditioners as the research subject and analyzed their regulation characteristics,economic efficiency,and comfort.The experimental results show that as the adjustment time of the air conditioner increases from 5 minutes to 35 minutes,the stable adjustment quantity during the adjustment period decreases from 28.46 to 3.57,indicating that air conditioning loads can be controlled over a long period and have better adjustment effects in the short term.Overall,the experimental results of this paper demonstrate that analyzing the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction algorithms is effective.展开更多
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges...Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.展开更多
A new method to extract person-independent expression feature based on higher-order singular value decomposition (HOSVD) is proposed for facial expression recognition. Based on the assumption that similar persons ha...A new method to extract person-independent expression feature based on higher-order singular value decomposition (HOSVD) is proposed for facial expression recognition. Based on the assumption that similar persons have similar facial expression appearance and shape, the person-similarity weighted expression feature is proposed to estimate the expression feature of test persons. As a result, the estimated expression feature can reduce the influence of individuals caused by insufficient training data, and hence become less person-dependent. The proposed method is tested on Cohn-Kanade facial expression database and Japanese female facial expression (JAFFE) database. Person-independent experimental results show the superiority of the proposed method over the existing methods.展开更多
As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus ...As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.展开更多
As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quick...As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quickly is still challenging due to the method of extracting and processing handwriting features.In this paper,we propose an efficient system to identify writers through handwritten images,which integrates local and global features from similar handwritten images.The local features are modeled by effective aggregate processing,and global features are extracted through transfer learning.Specifically,the proposed system employs a pre-trained Residual Network to mine the relationship between large image sets and specific handwritten images,while the vector of locally aggregated descriptors with double power normalization is employed in aggregating local and global features.Moreover,handwritten image segmentation,preprocessing,enhancement,optimization of neural network architecture,and normalization for local and global features are exploited,significantly improving system performance.The proposed system is evaluated on Computer Vision Lab(CVL)datasets and the International Conference on Document Analysis and Recognition(ICDAR)2013 datasets.The results show that it represents good generalizability and achieves state-of-the-art performance.Furthermore,the system performs better when training complete handwriting patches with the normalization method.The experimental result indicates that it’s significant to segment handwriting reasonably while dealing with handwriting overlap,which reduces visual burstiness.展开更多
基金supported by Xiamen Medical and Health Guidance Project in 2021(No.3502Z20214ZD1070)supported by a grant from Guangxi Key Laboratory of Machine Vision and Intelligent Control,China(No.2023B02).
文摘The self-attention mechanism of Transformers,which captures long-range contextual information,has demonstrated significant potential in image segmentation.However,their ability to learn local,contextual relationships between pixels requires further improvement.Previous methods face challenges in efficiently managing multi-scale fea-tures of different granularities from the encoder backbone,leaving room for improvement in their global representation and feature extraction capabilities.To address these challenges,we propose a novel Decoder with Multi-Head Feature Receptors(DMHFR),which receives multi-scale features from the encoder backbone and organizes them into three feature groups with different granularities:coarse,fine-grained,and full set.These groups are subsequently processed by Multi-Head Feature Receptors(MHFRs)after feature capture and modeling operations.MHFRs include two Three-Head Feature Receptors(THFRs)and one Four-Head Feature Receptor(FHFR).Each group of features is passed through these MHFRs and then fed into axial transformers,which help the model capture long-range dependencies within the features.The three MHFRs produce three distinct feature outputs.The output from the FHFR serves as auxiliary auxiliary features in the prediction head,and the prediction output and their losses will eventually be aggregated.Experimental results show that the Transformer using DMHFR outperforms 15 state of the arts(SOTA)methods on five public datasets.Specifically,it achieved significant improvements in mean DICE scores over the classic Parallel Reverse Attention Network(PraNet)method,with gains of 4.1%,2.2%,1.4%,8.9%,and 16.3%on the CVC-ClinicDB,Kvasir-SEG,CVC-T,CVC-ColonDB,and ETIS-LaribPolypDB datasets,respectively.
基金funded by the China Chongqing Municipal Science and Technology Bureau,grant numbers CSTB2024TIAD-CYKJCXX0009,CSTB2024NSCQ-LZX0043,CSTB2022NSCQ-MSX0288Chongqing Municipal Commission of Housing and Urban-Rural Development,grant number CKZ2024-87+3 种基金the Chongqing University of Technology Graduate Education High-Quality Development Project,grant number gzlsz202401the Chongqing University of Technology—Chongqing LINGLUE Technology Co.,Ltd.Electronic Information(Artificial Intelligence)Graduate Joint Training Basethe Postgraduate Education and Teaching Reform Research Project in Chongqing,grant number yjg213116the Chongqing University of Technology-CISDI Chongqing Information Technology Co.,Ltd.Computer Technology Graduate Joint Training Base.
文摘Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world applications.Few-shot object detection presents a new research idea that aims to localize and classify objects in images using only limited annotated examples.However,the inherent challenge in few-shot object detection lies in the insufficient sample diversity to fully characterize the sample feature distribution,which consequently impacts model performance.Inspired by contrastive learning principles,we propose an Implicit Feature Contrastive Learning(IFCL)module to address this limitation and augment feature diversity for more robust representational learning.This module generates augmented support sample features in a mixed feature space and implicitly contrasts them with query Region of Interest(RoI)features.This approach facilitates more comprehensive learning of both intra-class feature similarity and inter-class feature diversity,thereby enhancing the model’s object classification and localization capabilities.Extensive experiments on PASCAL VOC show that our method achieves a respective improvement of 3.2%,1.8%,and 2.3%on 10-shot of three Novel Sets compared to the baseline model FPD.
基金suported by the Fundamental Research Grant Scheme(FRGS)of Universiti Sains Malaysia,Research Number:FRGS/1/2024/ICT02/USM/02/1.
文摘Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
文摘The fingerspelling recognition by hand shape is an important step for developing a human-computer interaction system. A method of fingerspelling recognition by hand shape using HLAC (higher-order local auto-correlation) features is proposed. Furthermore, in order to use HLAC features more effectively, the use of image processing techniques: reducing an image resolution, dividing an image, and image pre-processing techniques, is also proposed. The experimental results show that the proposed method is promising.
基金supported in part by the Nationa Natural Science Foundation of China (61876011)the National Key Research and Development Program of China (2022YFB4703700)+1 种基金the Key Research and Development Program 2020 of Guangzhou (202007050002)the Key-Area Research and Development Program of Guangdong Province (2020B090921003)。
文摘Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.
基金supported by the National Natural Science Foundation of China(Grant No.52078493)the Natural Science Foundation of Hunan Province(Grant No.2022JJ30700)+2 种基金the Natural Science Foundation for Excellent Young Scholars of Hunan(Grant No.2021JJ20057)the Science and Technology Plan Project of Changsha(Grant No.kq2305006)the Innovation Driven Program of Central South University(Grant No.2023CXQD033).
文摘Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental measurements,but these are often limited by the observation conditions,such as the number of configured sensors.Therefore,the resulting linear velocity profiles usually exhibit limitations in reproducing the temporal-varied and nonlinear behavior during the debris flow process.In this study,we present a novel approach to explore the debris flow velocity profile in detail upon our previous 3D-HBPSPH numerical model,i.e.,the three-dimensional Smoothed Particle Hydrodynamic model incorporating the Herschel-Bulkley-Papanastasiou rheology.Specifically,we propose a stratification aggregation algorithm for interpreting the details of SPH particles,which enables the recording of temporal velocities of debris flow at different mud depths.To analyze the velocity profile,we introduce a logarithmic-based nonlinear model with two key parameters,that a controlling the shape of velocity profile and b concerning its temporal evolution.We verify the proposed velocity profile and explore its sensitivity using 34 sets of velocity data from three individual flume experiments in previous literature.Our results demonstrate that the proposed temporalvaried nonlinear velocity profile outperforms the previous linear profiles.
基金supported by the State Grid Science&Technology Project(5100-202114296A-0-0-00).
文摘This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online identification method is a computer-involved approach for data collection,processing,and system identification,commonly used for adaptive control and prediction.This paper proposes a method for dynamically aggregating large-scale adjustable loads to support high proportions of new energy integration,aiming to study the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction methods.The experiment selected 300 central air conditioners as the research subject and analyzed their regulation characteristics,economic efficiency,and comfort.The experimental results show that as the adjustment time of the air conditioner increases from 5 minutes to 35 minutes,the stable adjustment quantity during the adjustment period decreases from 28.46 to 3.57,indicating that air conditioning loads can be controlled over a long period and have better adjustment effects in the short term.Overall,the experimental results of this paper demonstrate that analyzing the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction algorithms is effective.
基金Basic and Advanced Research Projects of CSTC,Grant/Award Number:cstc2019jcyj-zdxmX0008Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Numbers:KJQN202100634,KJZDK201900605National Natural Science Foundation of China,Grant/Award Number:62006065。
文摘Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.
基金supported by National Natural Science Foundation of China (6087208460940008)+2 种基金Beijing Training Programming Foundation for the Talents (20081D1600300343)Excellent Young Scholar Research Fund of Beijing Institute of Technology (2007Y0305)Fundamental Research Foundation of Beijing Institute of Technology (20080342005)
文摘A new method to extract person-independent expression feature based on higher-order singular value decomposition (HOSVD) is proposed for facial expression recognition. Based on the assumption that similar persons have similar facial expression appearance and shape, the person-similarity weighted expression feature is proposed to estimate the expression feature of test persons. As a result, the estimated expression feature can reduce the influence of individuals caused by insufficient training data, and hence become less person-dependent. The proposed method is tested on Cohn-Kanade facial expression database and Japanese female facial expression (JAFFE) database. Person-independent experimental results show the superiority of the proposed method over the existing methods.
基金This work was supported in part by the National Natural Science Foundation of China(Nos.62072074,62076054,62027827,61902054)the Frontier Science and Technology Innovation Projects of National Key R&D Program(No.2019QY1405)+2 种基金the Sichuan Science and Technology Innovation Platform and Talent Plan(No.2020JDJQ0020)the Sichuan Science and Technology Support Plan(No.2020YFSY0010)the Natural Science Foundation of Guangdong Province(No.2018A030313354).
文摘As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.
基金supported in part by the Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant KYCX 20_0758in part by the Science and Technology Research Project of Jiangsu Public Security Department under Grant 2020KX005+1 种基金in part by the General Project of Philosophy and Social Science Research in Colleges and Universities in Jiangsu Province under Grant 2022SJYB0473in part by“Cyberspace Security”Construction Project of Jiangsu Provincial Key Discipline during the“14th Five Year Plan”.
文摘As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quickly is still challenging due to the method of extracting and processing handwriting features.In this paper,we propose an efficient system to identify writers through handwritten images,which integrates local and global features from similar handwritten images.The local features are modeled by effective aggregate processing,and global features are extracted through transfer learning.Specifically,the proposed system employs a pre-trained Residual Network to mine the relationship between large image sets and specific handwritten images,while the vector of locally aggregated descriptors with double power normalization is employed in aggregating local and global features.Moreover,handwritten image segmentation,preprocessing,enhancement,optimization of neural network architecture,and normalization for local and global features are exploited,significantly improving system performance.The proposed system is evaluated on Computer Vision Lab(CVL)datasets and the International Conference on Document Analysis and Recognition(ICDAR)2013 datasets.The results show that it represents good generalizability and achieves state-of-the-art performance.Furthermore,the system performs better when training complete handwriting patches with the normalization method.The experimental result indicates that it’s significant to segment handwriting reasonably while dealing with handwriting overlap,which reduces visual burstiness.