期刊文献+
共找到1,479篇文章
< 1 2 74 >
每页显示 20 50 100
A teacher-student based attention network for fine-grainedimage recognition
1
作者 Ang Li Xueyi Zhang +1 位作者 Peilin Li Bin Kang 《Digital Communications and Networks》 2025年第1期52-59,共8页
Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existin... Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework. 展开更多
关键词 fine-grained image recognition Collaborative teacher-student strategy Multi-attention Global attention
在线阅读 下载PDF
Visual feature inter-learning for sign language recognition in emergency medicine
2
作者 WEI Chao LI Yunpeng LIU Jingze 《Optoelectronics Letters》 2025年第10期619-625,共7页
Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emerg... Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emergency medicine poses a significant challenge.To address this,we propose a novel approach based on the inter-learning of visual features between global and local information.Specifically,our method enhances the perception capabilities of the visual feature extractor by strategically leveraging the strengths of convolutional neural network(CNN),which are adept at capturing local features,and visual transformers which perform well at perceiving global features.Furthermore,to mitigate the issue of overfitting caused by the limited availability of sign language data for emergency medical applications,we introduce an enhanced short temporal module for data augmentation through additional subsequences.Experimental results on three publicly available sign language datasets demonstrate the efficacy of the proposed approach. 展开更多
关键词 sign language recognition slr visual feature inter learning emergency medicine visual feature extractor capture both local global information enhances perception capabilities emergency medical assistance sign language recognition
原文传递
CerfeVPR: Cross-Environment Robust Feature Enhancement for Visual Place Recognition
3
作者 Lingyun Xiang Hang Fu Chunfang Yang 《Computers, Materials & Continua》 2025年第7期325-345,共21页
In the Visual Place Recognition(VPR)task,existing research has leveraged large-scale pre-trained models to improve the performance of place recognition.However,when there are significant environmental differences betw... In the Visual Place Recognition(VPR)task,existing research has leveraged large-scale pre-trained models to improve the performance of place recognition.However,when there are significant environmental differences between query images and reference images,a large number of ineffective local features will interfere with the extraction of key landmark features,leading to the retrieval of visually similar but geographically different images.To address this perceptual aliasing problem caused by environmental condition changes,we propose a novel Visual Place Recognition method with Cross-Environment Robust Feature Enhancement(CerfeVPR).This method uses the GAN network to generate similar images of the original images under different environmental conditions,thereby enhancing the learning of robust features of the original images.This enables the global descriptor to effectively ignore appearance changes caused by environmental factors such as seasons and lighting,showing better place recognition accuracy than other methods.Meanwhile,we introduce a large kernel convolution adapter to fine tune the pre-trained model,obtaining a better image feature representation for subsequent robust feature learning.Then,we process the information of different local regions in the general features through a 3-layer pyramid scene parsing network and fuse it with a tag that retains global information to construct a multi-dimensional image feature representation.Based on this,we use the fused features of similar images to drive the robust feature learning of the original images and complete the feature matching between query images and retrieved images.Experiments on multiple commonly used datasets show that our method exhibits excellent performance.On average,CerfeVPR achieves the highest results,with all Recall@N values exceeding 90%.In particular,on the highly challenging Nordland dataset,the R@1 metric is improved by 4.6%,significantly outperforming other methods,which fully verifies the superiority of CerfeVPR in visual place recognition under complex environments. 展开更多
关键词 visual place recognition cross-environment robustness pre-trained model feature learning
在线阅读 下载PDF
A Novel CAPTCHA Recognition System Based on Refined Visual Attention
4
作者 Zaid Derea Beiji Zou +3 位作者 Xiaoyan Kui Monir Abdullah Alaa Thobhani Amr Abdussalam 《Computers, Materials & Continua》 2025年第4期115-136,共22页
Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)has emerged as a key strategy for distinguishing huma... Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)has emerged as a key strategy for distinguishing human users from automated bots.Text-based CAPTCHAs,designed to be easily decipherable by humans yet challenging for machines,are a common form of this verification.However,advancements in deep learning have facilitated the creation of models adept at recognizing these text-based CAPTCHAs with surprising efficiency.In our comprehensive investigation into CAPTCHA recognition,we have tailored the renowned UpDown image captioning model specifically for this purpose.Our approach innovatively combines an encoder to extract both global and local features,significantly boosting the model’s capability to identify complex details within CAPTCHA images.For the decoding phase,we have adopted a refined attention mechanism,integrating enhanced visual attention with dual layers of Long Short-Term Memory(LSTM)networks to elevate CAPTCHA recognition accuracy.Our rigorous testing across four varied datasets,including those from Weibo,BoC,Gregwar,and Captcha 0.3,demonstrates the versatility and effectiveness of our method.The results not only highlight the efficiency of our approach but also offer profound insights into its applicability across different CAPTCHA types,contributing to a deeper understanding of CAPTCHA recognition technology. 展开更多
关键词 Text-based CAPTCHA recognition refined visual attention web security computer vision
在线阅读 下载PDF
A Dual-Layer Attention Based CAPTCHA Recognition Approach with Guided Visual Attention
5
作者 Zaid Derea Beiji Zou +2 位作者 Xiaoyan Kui Alaa Thobhani Amr Abdussalam 《Computer Modeling in Engineering & Sciences》 2025年第3期2841-2867,共27页
Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.Whil... Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.While text-based CAPTCHAs are designed to challenge machines while remaining human-readable,recent advances in deep learning have enabled models to recognize them with remarkable efficiency.In this regard,we propose a novel two-layer visual attention framework for CAPTCHA recognition that builds on traditional attention mechanisms by incorporating Guided Visual Attention(GVA),which sharpens focus on relevant visual features.We have specifically adapted the well-established image captioning task to address this need.Our approach utilizes the first-level attention module as guidance to the second-level attention component,incorporating two LSTM(Long Short-Term Memory)layers to enhance CAPTCHA recognition.Our extensive evaluation across four diverse datasets—Weibo,BoC(Bank of China),Gregwar,and Captcha 0.3—shows the adaptability and efficacy of our method.Our approach demonstrated impressive performance,achieving an accuracy of 96.70%for BoC and 95.92%for Webo.These results underscore the effectiveness of our method in accurately recognizing and processing CAPTCHA datasets,showcasing its robustness,reliability,and ability to handle varied challenges in CAPTCHA recognition. 展开更多
关键词 Text-based CAPTCHA image recognition guided visual attention web security computer vision
在线阅读 下载PDF
DIEONet:Domain-Invariant Information Extraction and Optimization Network for Visual Place Recognition
6
作者 Shaoqi Hou Zebang Qin +3 位作者 Chenyu Wu Guangqiang Yin Xinzhong Wang Zhiguo Wang 《Computers, Materials & Continua》 2025年第3期5019-5033,共15页
Visual Place Recognition(VPR)technology aims to use visual information to judge the location of agents,which plays an irreplaceable role in tasks such as loop closure detection and relocation.It is well known that pre... Visual Place Recognition(VPR)technology aims to use visual information to judge the location of agents,which plays an irreplaceable role in tasks such as loop closure detection and relocation.It is well known that previous VPR algorithms emphasize the extraction and integration of general image features,while ignoring the mining of salient features that play a key role in the discrimination of VPR tasks.To this end,this paper proposes a Domain-invariant Information Extraction and Optimization Network(DIEONet)for VPR.The core of the algorithm is a newly designed Domain-invariant Information Mining Module(DIMM)and a Multi-sample Joint Triplet Loss(MJT Loss).Specifically,DIMM incorporates the interdependence between different spatial regions of the feature map in the cascaded convolutional unit group,which enhances the model’s attention to the domain-invariant static object class.MJT Loss introduces the“joint processing of multiple samples”mechanism into the original triplet loss,and adds a new distance constraint term for“positive and negative”samples,so that the model can avoid falling into local optimum during training.We demonstrate the effectiveness of our algorithm by conducting extensive experiments on several authoritative benchmarks.In particular,the proposed method achieves the best performance on the TokyoTM dataset with a Recall@1 metric of 92.89%. 展开更多
关键词 visual place recognition domain-invariant information mining module multi-sample joint triplet loss
在线阅读 下载PDF
Fine-Grained Ship Recognition Based on Visible and Near-Infrared Multimodal Remote Sensing Images: Dataset,Methodology and Evaluation 被引量:1
7
作者 Shiwen Song Rui Zhang +1 位作者 Min Hu Feiyao Huang 《Computers, Materials & Continua》 SCIE EI 2024年第6期5243-5271,共29页
Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi... Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios. 展开更多
关键词 Multi-modality dataset ship recognition fine-grained recognition attention mechanism
在线阅读 下载PDF
Fine-Grained Action Recognition Based on Temporal Pyramid Excitation Network 被引量:1
8
作者 Xuan Zhou Jianping Yi 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期2103-2116,共14页
Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal windo... Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal window to learn local video representation.However,these methods failed to capture complex motion patterns due to their limited receptive field.To solve the above problems,this paper proposes a lightweight Temporal Pyramid Excitation(TPE)module to capture the short,medium,and long-term temporal context.In this method,Temporal Pyramid(TP)module can effectively expand the temporal receptive field of the network by using the multi-temporal kernel decomposition without significantly increasing the computational cost.In addition,the Multi Excitation module can emphasize temporal importance to enhance the temporal feature representation learning.TPE can be integrated into ResNet50,and building a compact video learning framework-TPENet.Extensive validation experiments on several challenging benchmark(Something-Something V1,Something-Something V2,UCF-101,and HMDB51)datasets demonstrate that our method achieves a preferable balance between computation and accuracy. 展开更多
关键词 fine-grained action recognition temporal pyramid excitation module temporal receptive multi-excitation module
在线阅读 下载PDF
Fine-grained Ship Image Recognition Based on BCNN with Inception and AM-Softmax
9
作者 Zhilin Zhang Ting Zhang +4 位作者 Zhaoying Liu Peijie Zhang Shanshan Tu Yujian Li Muhammad Waqas 《Computers, Materials & Continua》 SCIE EI 2022年第10期1527-1539,共13页
The fine-grained ship image recognition task aims to identify various classes of ships.However,small inter-class,large intra-class differences between ships,and lacking of training samples are the reasons that make th... The fine-grained ship image recognition task aims to identify various classes of ships.However,small inter-class,large intra-class differences between ships,and lacking of training samples are the reasons that make the task difficult.Therefore,to enhance the accuracy of the fine-grained ship image recognition,we design a fine-grained ship image recognition network based on bilinear convolutional neural network(BCNN)with Inception and additive margin Softmax(AM-Softmax).This network improves the BCNN in two aspects.Firstly,by introducing Inception branches to the BCNN network,it is helpful to enhance the ability of extracting comprehensive features from ships.Secondly,by adding margin values to the decision boundary,the AM-Softmax function can better extend the inter-class differences and reduce the intra-class differences.In addition,as there are few publicly available datasets for fine-grained ship image recognition,we construct a Ship-43 dataset containing 47,300 ship images belonging to 43 categories.Experimental results on the constructed Ship-43 dataset demonstrate that our method can effectively improve the accuracy of ship image recognition,which is 4.08%higher than the BCNN model.Moreover,comparison results on the other three public fine-grained datasets(Cub,Cars,and Aircraft)further validate the effectiveness of the proposed method. 展开更多
关键词 fine-grained ship image recognition INCEPTION AM-softmax BCNN
在线阅读 下载PDF
METHODS OF VISUAL RECOGNITION,POSITIONING AND ORIENTATING OF 3 D SIMPLE GEOMETRIC WORKPIECE
10
作者 王向军 王以忠 叶声华 《Transactions of Tianjin University》 EI CAS 1998年第2期36-40,共5页
The methods of visual recognition,positioning and orienting with simple 3 D geometric workpieces are presented in this paper.The principle and operating process of multiple orientation run le... The methods of visual recognition,positioning and orienting with simple 3 D geometric workpieces are presented in this paper.The principle and operating process of multiple orientation run length coding based on general orientation run length coding and visual recognition method are described elaborately.The method of positioning and orientating based on the moment of inertia of the workpiece binary image is stated also.It has been applied in a research on flexible automatic coordinate measuring system formed by integrating computer aided design,computer vision and computer aided inspection planning,with a coordinate measuring machine.The results show that integrating computer vision with measurement system is a feasible and effective approach to improve their flexibility and automation. 展开更多
关键词 automatic measurement visual recognition visual positioning visual orientating coordinate measuring machine
在线阅读 下载PDF
Detection and Recognition of Spray Code Numbers on Can Surfaces Based on OCR
11
作者 Hailong Wang Junchao Shi 《Computers, Materials & Continua》 SCIE EI 2025年第1期1109-1128,共20页
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ... A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition. 展开更多
关键词 Can coding recognition differentiable binarization network scene visual text recognition model pruning and quantification transport model
在线阅读 下载PDF
Visualization of flatness pattern recognition based on T-S cloud inference network 被引量:2
12
作者 张秀玲 赵亮 +1 位作者 臧佳音 樊红敏 《Journal of Central South University》 SCIE EI CAS CSCD 2015年第2期560-566,共7页
Flatness pattern recognition is the key of the flatness control. The accuracy of the present flatness pattern recognition is limited and the shape defects cannot be reflected intuitively. In order to improve it, a nov... Flatness pattern recognition is the key of the flatness control. The accuracy of the present flatness pattern recognition is limited and the shape defects cannot be reflected intuitively. In order to improve it, a novel method via T-S cloud inference network optimized by genetic algorithm(GA) is proposed. T-S cloud inference network is constructed with T-S fuzzy neural network and the cloud model. So, the rapid of fuzzy logic and the uncertainty of cloud model for processing data are both taken into account. What's more, GA possesses good parallel design structure and global optimization characteristics. Compared with the simulation recognition results of traditional BP Algorithm, GA is more accurate and effective. Moreover, virtual reality technology is introduced into the field of shape control by Lab VIEW, MATLAB mixed programming. And virtual flatness pattern recognition interface is designed.Therefore, the data of engineering analysis and the actual model are combined with each other, and the shape defects could be seen more lively and intuitively. 展开更多
关键词 pattern recognition T-S cloud inference network cloud model mixed programming virtual reality visual recognition
在线阅读 下载PDF
Analysis on the visual recognition effect of Tibetan-Chinese bilingual guide signs 被引量:1
13
作者 Chen Fei Su Guanhong +3 位作者 Zhang Danni Wang Chenzhu Zhang Yunlong Bo Wu 《Journal of Southeast University(English Edition)》 EI CAS 2023年第3期292-300,共9页
The influence of Tibetan characters on the visual recognition effects of Tibetan-Chinese bilingual guide signs based on drivers visual characteristics was studied.Four versions of Tibetan-Chinese bilingual guide signs... The influence of Tibetan characters on the visual recognition effects of Tibetan-Chinese bilingual guide signs based on drivers visual characteristics was studied.Four versions of Tibetan-Chinese bilingual guide signs with different heights and aspect ratios of Tibetan characters were designed,and corresponding road simulation models were established.10 Tibetan drivers and 10 Han drivers were selected to conduct driving simulation experiments using a driving simulator and eye tracker.The resultant data of the participant s pupil diameter and the visual recognition duration obtained from the eye tracker system were analyzed by analysis of variance.Combining results from the statistical analysis of driving simulator data and the questionnaire results on the visual recognition experience,it can be concluded that for Tibetan drivers,when the height of Tibetan characters was 2/3 of the height of Chinese characters,the visual recognition effect of the signs was better than that of 1/3 and 1/2 of the height of Chinese characters,indicating that increasing the height of Tibetan characters was conducive to improving the visual recognition effect of guide signs.The aspect ratio form of Tibetan had no significant effect on the level of difficulty encountered in drivers visual recognition,but it would affect the aesthetics of the bilingual guide signs.The recommended character height in Tibetan should be increased to improve the visual recognition process for Tibetan drivers. 展开更多
关键词 Tibetan-Chinese bilingual guide sign driving simulation visual recognition characteristics
在线阅读 下载PDF
DWDet:A Fine-Grained Object DetectionAlgorithm for Remote Sensing Aircraft
14
作者 Meijing Gao Yonghao Yan +5 位作者 Xiangrui Fan Huanyu Sun Sibo Chen Xu Chen Bingzhou Sun Ning Guan 《Journal of Beijing Institute of Technology》 2025年第4期337-349,共13页
Fine-grained aircraft target detection in remote sensing holds significant research valueand practical applications,particularly in military defense and precision strikes.Given the complex-ity of remote sensing images... Fine-grained aircraft target detection in remote sensing holds significant research valueand practical applications,particularly in military defense and precision strikes.Given the complex-ity of remote sensing images,where targets are often small and similar within categories,detectingthese fine-grained targets is challenging.To address this,we constructed a fine-grained dataset ofremotely sensed airplanes;for the problems of remote sensing fine-grained targets with obvious head-to-tail distributions and large variations in target sizes,we proposed the DWDet fine-grained tar-get detection and recognition algorithm.First,for the problem of unbalanced category distribution,we adopt an adaptive sampling strategy.In addition,we construct a deformable convolutional blockand improve the decoupling head structure to improve the detection effect of the model ondeformed targets.Then,we design a localization loss function,which is used to improve the model’slocalization ability for targets of different scales.The experimental results show that our algorithmimproves the overall accuracy of the model by 4.1%compared to the baseline model,and improvesthe detection accuracy of small targets by 12.2%.The ablation and comparison experiments alsoprove the effectiveness of our algorithm. 展开更多
关键词 remote sensing fine-grained recognition aircraft remote-sensing datasets multi-scaletarget detection
在线阅读 下载PDF
VTAN: A Novel Video Transformer Attention-Based Network for Dynamic Sign Language Recognition
15
作者 Ziyang Deng Weidong Min +2 位作者 Qing Han Mengxue Liu Longfei Li 《Computers, Materials & Continua》 2025年第2期2793-2812,共20页
Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dyn... Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks. 展开更多
关键词 Dynamic sign language recognition TRANSFORMER soft attention attention-based visual feature aggregation
在线阅读 下载PDF
Design of an Intelligent Robotic Excavator Based on Binocular Visual Recognition Technique 被引量:1
16
作者 ZHANG Xin LIU Jing WEN Huai-xing 《International Journal of Plant Engineering and Management》 2009年第1期48-51,共4页
Research on intelligent and robotic excavator has become a focus both at home and abroad, and this type of excavator becomes more and more important in application. In this paper, we developed a control system which c... Research on intelligent and robotic excavator has become a focus both at home and abroad, and this type of excavator becomes more and more important in application. In this paper, we developed a control system which can make the intelligent robotic excavator perform excavating operation autonomously. It can recognize the excavating targets by itself, program the operation automatically based on the original parameter, and finish all the tasks. Experimental results indicate the validity in real-time performance and precision of the control system. The intelligent robotic excavator can remarkably ease the labor intensity and enhance the working efficiency. 展开更多
关键词 excavating robot binocular visual recognition distributed control system trajectory tracing
在线阅读 下载PDF
Visual recognition of melamine in milk via selective metallo-hydrogel formation
17
作者 Xiaoling Bao Jianhong Liu +4 位作者 Qingshu Zheng Wei Pei Yimei Yang Yanyun Dai Tao Tu 《Chinese Chemical Letters》 SCIE CAS CSCD 2019年第12期2266-2270,共5页
A series of novel six-coordinated terpyridine zinc complexes,containing ammonium salts and thymine fragment at the two terminals,have been designed and synthesized,which can function as highly sensitive visualized sen... A series of novel six-coordinated terpyridine zinc complexes,containing ammonium salts and thymine fragment at the two terminals,have been designed and synthesized,which can function as highly sensitive visualized sensors for melamine detection via selective metallo-hydrogel formation.After fully characterization by various techniques,the complementary triple-hydrogen-bonding between the thymine fragment and melamine,as well as π-π stacking interactions may be responsible for the selective metallo-hydrogel formation.In light of the possible interference aroused by milk ingredients(proteins,peptides and amino acids) and legal/illegal additives(urine,sugars and vitamins),a series of control experiments are therefore involved.To our delight,this visual recognition is highly selective,no gelation was observed with the selected milk ingredients or additives.Remarkably,this new developed protocol enables convenient and highly selective visual recognition of melamine at a concentration as low as 10 ppm in raw milk without any tedious pretreatment. 展开更多
关键词 Hydrogen-bonding interaction Pincer zinc complex MELAMINE Metallo-hydrogel visual recognition
原文传递
Preliminary study on visual recognition under low visibility conditions caused by artificial dynamic smog
18
作者 Xu-Hong Zhang Zhe-Yi Chen +6 位作者 Bin-Bin Su Karunanedi Soobraydoo Hao-Ran Wu Qin-Zhuan Ren Lu Sun Fan Lyu Jun Jiang 《International Journal of Ophthalmology(English edition)》 SCIE CAS 2018年第11期1821-1828,共8页
AIM: To quantitatively evaluate the effect of a simulated smog environment on human visual function by psychophysical methods.METHODS: The smog environment was simulated in a 40×40×60 cm3 glass chamber fil... AIM: To quantitatively evaluate the effect of a simulated smog environment on human visual function by psychophysical methods.METHODS: The smog environment was simulated in a 40×40×60 cm3 glass chamber filled with a PM2.5 aerosol, and 14 subjects with normal visual function were examined by psychophysical methods with the foggy smog box placed in front of their eyes. The transmission of light through the smog box, an indication of the percentage concentration of smog, was determined with a luminance meter. Visual function under different smog concentrations was evaluated by the E-visual acuity, crowded E-visual acuity and contrast sensitivity.RESULTS: E-visual acuity, crowded E-visual acuity and contrast sensitivity were all impaired with a decrease in the transmission rate(TR) according to power functions, with invariable exponents of-1.41,-1.62 and-0.7, respectively, and R2 values of 0.99 for E and crowded E-visual acuity, 0.96 for contrast sensitivity. Crowded E-visual acuity decreased faster than E-visual acuity. There was a good correlation between the TR, extinction coefficient and visibility under heavy-smog conditions.CONCLUSION: Increases in smog concentration have a strong effect on visual function. 展开更多
关键词 visual recognition low visibility conditions artificial smog
原文传递
Deep Learning-Based Approach for Arabic Visual Speech Recognition
19
作者 Nadia H.Alsulami Amani T.Jamal Lamiaa A.Elrefaei 《Computers, Materials & Continua》 SCIE EI 2022年第4期85-108,共24页
Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In thi... Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In this paper,we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms.The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers.The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase.Firstly,we extract keyframes from our dataset.Secondly,we produce a Concatenated Frame Images(CFIs)that represent the utterance sequence in one single image.Finally,the VGG-19 is employed for visual features extraction in our proposed model.We have examined different keyframes:10,15,and 20 for comparing two types of approaches in the proposed model:(1)the VGG-19 base model and(2)VGG-19 base model with batch normalization.The results show that the second approach achieves greater accuracy:94%for digit recognition,97%for phrase recognition,and 93%for digits and phrases recognition in the test dataset.Therefore,our proposed model is superior to models based on CFIs input. 展开更多
关键词 Convolutional neural network deep learning lip reading transfer learning visual speech recognition
在线阅读 下载PDF
A survey of fine-grained visual categorization based on deep learning
20
作者 XIE Yuxiang GONG Quanzhi +2 位作者 LUAN Xidao YAN Jie ZHANG Jiahui 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1337-1356,共20页
Deep learning has achieved excellent results in various tasks in the field of computer vision,especially in fine-grained visual categorization.It aims to distinguish the subordinate categories of the label-level categ... Deep learning has achieved excellent results in various tasks in the field of computer vision,especially in fine-grained visual categorization.It aims to distinguish the subordinate categories of the label-level categories.Due to high intra-class variances and high inter-class similarity,the fine-grained visual categorization is extremely challenging.This paper first briefly introduces and analyzes the related public datasets.After that,some of the latest methods are reviewed.Based on the feature types,the feature processing methods,and the overall structure used in the model,we divide them into three types of methods:methods based on general convolutional neural network(CNN)and strong supervision of parts,methods based on single feature processing,and meth-ods based on multiple feature processing.Most methods of the first type have a relatively simple structure,which is the result of the initial research.The methods of the other two types include models that have special structures and training processes,which are helpful to obtain discriminative features.We conduct a specific analysis on several methods with high accuracy on pub-lic datasets.In addition,we support that the focus of the future research is to solve the demand of existing methods for the large amount of the data and the computing power.In terms of tech-nology,the extraction of the subtle feature information with the burgeoning vision transformer(ViT)network is also an important research direction. 展开更多
关键词 deep learning fine-grained visual categorization convolutional neural network(CNN) visual attention
在线阅读 下载PDF
上一页 1 2 74 下一页 到第
使用帮助 返回顶部