期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Segment differential aggregation representation and supervised compensation learning of ConvNets for human action recognition
1
作者 REN ZiLiang ZHANG QieShi +3 位作者 CHENG Qin XU ZhenYu YUAN Shuai LUO DeLin 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2024年第1期197-208,共12页
With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly attractive topic.However,one of the main challenges is to effectively extract complementary feat... With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly attractive topic.However,one of the main challenges is to effectively extract complementary features from different modalities for action recognition.In this work,a novel multimodal supervised learning framework based on convolution neural networks(Conv Nets)is proposed to facilitate extracting the compensation features from different modalities for human action recognition.Built on information aggregation mechanism and deep Conv Nets,our recognition framework represents spatial-temporal information from the base modalities by a designed frame difference aggregation spatial-temporal module(FDA-STM),that the networks bridges information from skeleton data through a multimodal supervised compensation block(SCB)to supervise the extraction of compensation features.We evaluate the proposed recognition framework on three human action datasets,including NTU RGB+D 60,NTU RGB+D 120,and PKU-MMD.The results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets. 展开更多
关键词 action recognition segment frame difference aggregation supervised compensation learning convnets
原文传递
Impedance flow cytometry empowered by ConvNet algorithm to differentiate bladder cancer cells based on electro-mechanical characteristics
2
作者 Shuaihua Zhang Zhiwen Zheng +2 位作者 Yongqi Chen Zhihong Zhang Ziyu Han 《Nanotechnology and Precision Engineering》 2025年第3期88-97,共10页
Bladder cancer(BC)is a common malignancy and among the leading causes of cancer death worldwide.Analysis of BC cells is of great significance for clinical diagnosis and disease treatment.Current approaches rely mainly... Bladder cancer(BC)is a common malignancy and among the leading causes of cancer death worldwide.Analysis of BC cells is of great significance for clinical diagnosis and disease treatment.Current approaches rely mainly on imaging-based technology,which requires complex staining and sophisticated instrumentation.In this work,we develop a label-free method based on artificial intelligence(AI)-assisted impedance-based flow cytometry(IFC)to differentiate between various BC cells and epithelial cells at single-cell resolution.By applying multiple-frequency excitations,the electrical characteristics of cells,including membrane and nuclear opacities,are extracted,allowing distinction to be made between epithelial cells,low-grade,and high-grade BC cells.Through the use of a constriction channel,the electro-mechanical properties associated with active deformation behavior of cells are investigated,and it is demonstrated that BC cells have a greater capability of shape recovery,an observation that further increases differentiation accuracy.With the assistance of a convolutional neural network-based AI algorithm,IFC is able to effectively differentiate various BC and epithelial cells with accuracies of over 95%.In addition,different grades of BC cells are successfully differentiated in both spiked mixed samples and bladder tumor tissues. 展开更多
关键词 Impedance flow cytometry ConvNet model Differentiation between cells Bladder cancer analysis
暂未订购
Fusing Geometric and Temporal Deep Features for High-Precision Arabic Sign Language Recognition
3
作者 Yazeed Alkharijah Shehzad Khalid +2 位作者 Syed Muhammad Usman Amina Jameel Danish Hamid 《Computer Modeling in Engineering & Sciences》 2025年第7期1113-1141,共29页
Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;howev... Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks. 展开更多
关键词 Arabic sign language recognition multimodal feature fusion ensemble classification skeletal data inflated 3D ConvNet(I3D)
在线阅读 下载PDF
基于Swin Transformer和Style-based Generator的盲人脸修复 被引量:1
4
作者 向泽林 楼旭东 李旭伟 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2023年第3期59-67,共9页
盲人脸修复任务是从低质量的图像(例如模糊、噪声和压缩图像)中恢复高质量的图像.由于事先不知道低质量图像的退化类型和退化参数,因此盲人脸修复是一个高度病态的问题,在修复过程中严重依赖各种先验指导.然而,由于面部成分和面部标志... 盲人脸修复任务是从低质量的图像(例如模糊、噪声和压缩图像)中恢复高质量的图像.由于事先不知道低质量图像的退化类型和退化参数,因此盲人脸修复是一个高度病态的问题,在修复过程中严重依赖各种先验指导.然而,由于面部成分和面部标志等面部先验通常是从低质量图像中提取或估计的,可能存在不准确的情况,这直接影响最终的修复性能,因此难以有效利用这些先验知识.此外,目前的主流方法基本都是依赖ConvNets进行特征提取,没有很好地考虑长距离特征,导致最终结果缺乏连续一致性.本文提出了一种改进的StyleGAN模型,命名为SwinStyleGAN,应用在高级视觉任务上表现出色的Swin Transformer来提取长距离特征,并通过改进后的类StyleGAN合成网络逐步生成图像.本文设计了一个空间注意力转换模块SAT来重新分配每个阶段特征的像素权重,以进一步约束生成器.大量实验表明,本文提出的方法具有更好的盲人脸修复性能. 展开更多
关键词 盲人脸修复 convnets Swin Transformer StyleGAN 空间注意力转换模块
在线阅读 下载PDF
基于深度学习的交通标志识别智能车的设计与实现 被引量:1
5
作者 熊旋锦 潘小琴 +1 位作者 唐楷 康勇 《自动化与仪表》 2018年第6期104-108,共5页
汽车智能技术已成为汽车技术进步的主要方向,针对传统智能车自动驾驶中交通标志检测不准确、不及时等问题,提出了基于深度学习的检测交通标志的算法,并将该算法与小型智能车相结合进行模拟测试。智能车图像处理以树莓派作为主控,核心算... 汽车智能技术已成为汽车技术进步的主要方向,针对传统智能车自动驾驶中交通标志检测不准确、不及时等问题,提出了基于深度学习的检测交通标志的算法,并将该算法与小型智能车相结合进行模拟测试。智能车图像处理以树莓派作为主控,核心算法采用卷积网络(Conv Nets)解决交通标志的分类任务,智能车模拟控制部分用STM32作为控制端,根据图像处理结果,精细控制小车前进、停止、左右转弯等动作。由算法的仿真结果可知,该算法的CCR已提高至98.82%;根据模拟测试,该智能车在十字路口可根据交通标志规划路线,实现主动避障和自动前行。 展开更多
关键词 深度学习 交通标志识别 小型智能车 STM32 树莓派 convnets
在线阅读 下载PDF
Dynamic Hand Gesture Recognition Based on Short-Term Sampling Neural Networks 被引量:14
6
作者 Wenjin Zhang Jiacun Wang Fangping Lan 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第1期110-120,共11页
Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning netwo... Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures. 展开更多
关键词 Convolutional neural network(ConvNet) hand gesture recognition long short-term memory(LSTM)network short-term sampling transfer learning
在线阅读 下载PDF
VNLSTM-PoseNet: A novel deep ConvNet for real-time 6-DOF camera relocalization in urban streets 被引量:6
7
作者 Ming Li Jiangying Qin +3 位作者 Deren Li Ruizhi Chen Xuan Liao Bingxuan Guo 《Geo-Spatial Information Science》 SCIE EI CSCD 2021年第3期422-437,共16页
Image-based relocalization is a renewed interest in outdoor environments,because it is an important problem with many applications.PoseNet introduces Convolutional Neural Network(CNN)for the first time to realize the ... Image-based relocalization is a renewed interest in outdoor environments,because it is an important problem with many applications.PoseNet introduces Convolutional Neural Network(CNN)for the first time to realize the real-time camera pose solution based on a single image.In order to solve the problem of precision and robustness of PoseNet and its improved algorithms in complex environment,this paper proposes and implements a new visual relocation method based on deep convolutional neural networks(VNLSTM-PoseNet).Firstly,this method directly resizes the input image without cropping to increase the receptive field of the training image.Then,the image and the corresponding pose labels are put into the improved Long Short-Term Memory based(LSTM-based)PoseNet network for training and the network is optimized by the Nadam optimizer.Finally,the trained network is used for image localization to obtain the camera pose.Experimental results on outdoor public datasets show our VNLSTM-PoseNet can lead to drastic improvements in relocalization performance compared to existing state-of-theart CNN-based methods. 展开更多
关键词 Camera relocalization pose regression deep convnet RGB image camera pose
原文传递
Visual attention network 被引量:106
8
作者 Meng-Hao Guo Cheng-Ze Lu +2 位作者 Zheng-Ning Liu Ming-Ming Cheng Shi-Min Hu 《Computational Visual Media》 SCIE EI CSCD 2023年第4期733-752,共20页
While originally designed for natural language processing tasks,the self-attention mechanism has recently taken various computer vision areas by storm.However,the 2D nature of images brings three challenges for applyi... While originally designed for natural language processing tasks,the self-attention mechanism has recently taken various computer vision areas by storm.However,the 2D nature of images brings three challenges for applying self-attention in computer vision:(1)treating images as 1D sequences neglects their 2D structures;(2)the quadratic complexity is too expensive for high-resolution images;(3)it only captures spatial adaptability but ignores channel adaptability.In this paper,we propose a novel linear attention named large kernel attention(LKA)to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings.Furthermore,we present a neural network based on LKA,namely Visual Attention Network(VAN).While extremely simple,VAN achieves comparable results with similar size convolutional neural networks(CNNs)and vision transformers(ViTs)in various tasks,including image classification,object detection,semantic segmentation,panoptic segmentation,pose estimation,etc.For example,VAN-B6 achieves 87.8%accuracy on ImageNet benchmark,and sets new state-of-the-art performance(58.2%PQ)for panoptic segmentation.Besides,VAN-B2 surpasses Swin-T 4%mloU(50.1%vs.46.1%)for semantic segmentation on ADE20K benchmark,2.6%AP(48.8%vs.46.2%)for object detection on COCO dataset.It provides a novel method and a simple yet strong baseline for the community.The code is available at https://github.com/Visual-Attention-Network. 展开更多
关键词 vision backbone deep learning convnets ATTENTION
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部