期刊文献+
共找到983篇文章
< 1 2 50 >
每页显示 20 50 100
Detection and Recognition of Spray Code Numbers on Can Surfaces Based on OCR
1
作者 Hailong Wang Junchao Shi 《Computers, Materials & Continua》 SCIE EI 2025年第1期1109-1128,共20页
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ... A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition. 展开更多
关键词 Can coding recognition differentiable binarization network scene visual text recognition model pruning and quantification transport model
在线阅读 下载PDF
Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning 被引量:1
2
作者 Hang He Chao Ma +6 位作者 Shan Ye Wenqiang Tang Yuxuan Zhou Zhen Yu Jiaxin Yi Li Hou Mingcai Hou 《Journal of Earth Science》 SCIE CAS CSCD 2024年第3期1035-1043,共9页
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science ... Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts. 展开更多
关键词 Prompt Learning Named Entity recognition(NER) low resource geological text text information mining big data geology.
原文传递
Deep Learning-Based Natural Language Processing Model and Optical Character Recognition for Detection of Online Grooming on Social Networking Services
3
作者 Sangmin Kim Byeongcheon Lee +2 位作者 Muazzam Maqsood Jihoon Moon Seungmin Rho 《Computer Modeling in Engineering & Sciences》 2025年第5期2079-2108,共30页
The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children a... The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children and adolescents who are increasingly exposed to online grooming crimes.Early and accurate identification of grooming conversations is crucial in preventing long-term harm to victims.However,research on grooming detection in South Korea remains limited,as existing models trained primarily on English text and fail to reflect the unique linguistic features of SNS conversations,leading to inaccurate classifications.To address these issues,this study proposes a novel framework that integrates optical character recognition(OCR)technology with KcELECTRA,a deep learning-based natural language processing(NLP)model that shows excellent performance in processing the colloquial Korean language.In the proposed framework,the KcELECTRA model is fine-tuned by an extensive dataset,including Korean social media conversations,Korean ethical verification data from AI-Hub,and Korean hate speech data from Hug-gingFace,to enable more accurate classification of text extracted from social media conversation images.Experimental results show that the proposed framework achieves an accuracy of 0.953,outperforming existing transformer-based models.Furthermore,OCR technology shows high accuracy in extracting text from images,demonstrating that the proposed framework is effective for online grooming detection.The proposed framework is expected to contribute to the more accurate detection of grooming text and the prevention of grooming-related crimes. 展开更多
关键词 Online grooming KcELECTRA natural language processing optical character recognition social networking service text classification
在线阅读 下载PDF
Generating Factual Text via Entailment Recognition Task
4
作者 Jinqiao Dai Pengsen Cheng Jiayong Liu 《Computers, Materials & Continua》 SCIE EI 2024年第7期547-565,共19页
Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.Ho... Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance. 展开更多
关键词 text generation entailment recognition task natural language processing artificial intelligence
在线阅读 下载PDF
Region-Aware Fashion Contrastive Learning for Unified Attribute Recognition and Composed Retrieval 被引量:1
5
作者 WANG Kangping ZHAO Mingbo 《Journal of Donghua University(English Edition)》 CAS 2024年第4期405-415,共11页
Clothing attribute recognition has become an essential technology,which enables users to automatically identify the characteristics of clothes and search for clothing images with similar attributes.However,existing me... Clothing attribute recognition has become an essential technology,which enables users to automatically identify the characteristics of clothes and search for clothing images with similar attributes.However,existing methods cannot recognize newly added attributes and may fail to capture region-level visual features.To address the aforementioned issues,a region-aware fashion contrastive language-image pre-training(RaF-CLIP)model was proposed.This model aligned cropped and segmented images with category and multiple fine-grained attribute texts,achieving the matching of fashion region and corresponding texts through contrastive learning.Clothing retrieval found suitable clothing based on the user-specified clothing categories and attributes,and to further improve the accuracy of retrieval,an attribute-guided composed network(AGCN)as an additional component on RaF-CLIP was introduced,specifically designed for composed image retrieval.This task aimed to modify the reference image based on textual expressions to retrieve the expected target.By adopting a transformer-based bidirectional attention and gating mechanism,it realized the fusion and selection of image features and attribute text features.Experimental results show that the proposed model achieves a mean precision of 0.6633 for attribute recognition tasks and a recall@10(recall@k is defined as the percentage of correct samples appearing in the top k retrieval results)of 39.18 for composed image retrieval task,satisfying user needs for freely searching for clothing through images and texts. 展开更多
关键词 attribute recognition image retrieval contrastive language-image pre-training(CLIP) image text matching transformer
在线阅读 下载PDF
Named Entity Recognition for Nepali Text Using Support Vector Machines 被引量:3
6
作者 Surya Bahadur Bam Tej Bahadur Shahi 《Intelligent Information Management》 2014年第2期21-29,共9页
Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest i... Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size. 展开更多
关键词 Support VECTOR MACHINE Named ENTITY recognition MACHINE Learning Classification Nepali LANGUAGE text
在线阅读 下载PDF
Enhanced Attention-Based Encoder-Decoder Framework for Text Recognition 被引量:2
7
作者 S.Prabu K.Joseph Abraham Sundar 《Intelligent Automation & Soft Computing》 SCIE 2023年第2期2071-2086,共16页
Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propos... Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propose a simple yet powerful irregular text recognition framework based on an encoder-decoder archi-tecture.The proposed framework is divided into four main modules.Firstly,in the image transformation module,a Thin Plate Spline(TPS)transformation is employed to transform the irregular text image into a readable text image.Sec-ondly,we propose a novel Spatial Attention Module(SAM)to compel the model to concentrate on text regions and obtain enriched feature maps.Thirdly,a deep bi-directional long short-term memory(Bi-LSTM)network is used to make a con-textual feature map out of a visual feature map generated from a Convolutional Neural Network(CNN).Finally,we propose a Dual Step Attention Mechanism(DSAM)integrated with the Connectionist Temporal Classification(CTC)-Attention decoder to re-weights visual features and focus on the intra-sequence relationships to generate a more accurate character sequence.The effectiveness of our proposed framework is verified through extensive experiments on various benchmarks datasets,such as SVT,ICDAR,CUTE80,and IIIT5k.The perfor-mance of the proposed text recognition framework is analyzed with the accuracy metric.Demonstrate that our proposed method outperforms the existing approaches on both regular and irregular text.Additionally,the robustness of our approach is evaluated using the grocery datasets,such as GroZi-120,Web-Market,SKU-110K,and Freiburg Groceries datasets that contain complex text images.Still,our framework produces superior performance on grocery datasets. 展开更多
关键词 Deep learning text recognition text normalization attention mechanism convolutional neural network(CNN)
在线阅读 下载PDF
Semantic Entity Recognition and Relation Construction Method for Assembly Process Document
8
作者 顾星海 花豹 +2 位作者 刘亚辉 孙学民 鲍劲松 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第3期537-556,共20页
Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured... Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured natural language texts.In this paper,an assembly semantic entity recognition and relation con-struction method oriented to assembly process documents is proposed.First,the assembly process sentences are extracted from the table through concerned region recognition and cell division,and they will be stored as a key-value object file.Then,the semantic entities in the sentence are identified through the sequence tagging model based on the specific attention mechanism for assembly operation type.The syntactic rules are designed for realizing automatic construction of relation between entities.Finally,by using the self-constructed corpus,it is proved that the sequence tagging model in the proposed method performs better than the mainstream named entity recognition model when handling assembly process design language.The effectiveness of the proposed method is also analyzed through the simulation experiment in the small-scale real scene,compared with manual method.The results show that the proposed method can help designers accumulate knowledge automatically and efficiently. 展开更多
关键词 assembly process design knowledge extraction named entity recognition text extraction in table dependency syntactic parsing attention mechanism
原文传递
Text Recognition of Barcode Images under Harsh Lighting Conditions 被引量:1
9
作者 WU Xing GE Yuxi +1 位作者 ZHANG Qingfeng CHEN Liming 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2020年第6期531-537,共7页
The inventory counting of silver ingots plays a key role in silver futures.However,the manual inventory counting is time-consuming and labor-intensive.Furthermore,the silver ingots are stored in warehouses with harsh ... The inventory counting of silver ingots plays a key role in silver futures.However,the manual inventory counting is time-consuming and labor-intensive.Furthermore,the silver ingots are stored in warehouses with harsh lighting conditions,which makes the automatic inventory counting difficult.To meet the challenge,we propose an automatic inventory counting method integrating object detection and text recognition under harsh lighting conditions.With the help of our own dataset,the barcode on each silver ingot is detected and cropped by the feature pyramid network(FPN).The cropped image is normalized and corrected for text recognition.We use the PSENet+CRNN(Progressive Scale Expansion Network,Convolutional Recurrent Neural Network)for text detection and recognition to obtain the serial number of the silver ingot image.Experimental results show that the proposed automatic inventory counting method achieves good results since the accuracy of the proposed object detection and text recognition under harsh lighting conditions is near 99%. 展开更多
关键词 BARCODE object detection text recognition deep learning
原文传递
Improving CNN-BGRU Hybrid Network for Arabic Handwritten Text Recognition 被引量:1
10
作者 Sofiene Haboubi Tawfik Guesmi +4 位作者 Badr M Alshammari Khalid Alqunun Ahmed S Alshammari Haitham Alsaif Hamid Amiri 《Computers, Materials & Continua》 SCIE EI 2022年第12期5385-5397,共13页
Handwriting recognition is a challenge that interests many researchers around the world.As an exception,handwritten Arabic script has many objectives that remain to be overcome,given its complex form,their number of f... Handwriting recognition is a challenge that interests many researchers around the world.As an exception,handwritten Arabic script has many objectives that remain to be overcome,given its complex form,their number of forms which exceeds 100 and its cursive nature.Over the past few years,good results have been obtained,but with a high cost of memory and execution time.In this paper we propose to improve the capacity of bidirectional gated recurrent unit(BGRU)to recognize Arabic text.The advantages of using BGRUs is the execution time compared to other methods that can have a high success rate but expensive in terms of time andmemory.To test the recognition capacity of BGRU,the proposed architecture is composed by 6 convolutional neural network(CNN)blocks for feature extraction and 1 BGRU+2 dense layers for learning and test.The experiment is carried out on the entire database of institut für nachrichtentechnik/ecole nationale d’ingénieurs de Tunis(IFN/ENIT)without any preprocessing or data selection.The obtained results show the ability of BGRUs to recognize handwritten Arabic script. 展开更多
关键词 Arabic handwritten script handwritten text recognition deep learning IFN/ENIT bidirectional GRU neural network
在线阅读 下载PDF
Mathematical Named Entity Recognition Based on Adversarial Training and Self-Attention
11
作者 Qiuyu Lai Wang Kang +2 位作者 Lei Yang Chun Yang Delin Zhang 《Intelligent Automation & Soft Computing》 2024年第4期649-664,共16页
Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity bou... Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity boundary,and long-distance dependence between entities in Chinese mathematical entity recognition task,we propose a series of optimization processing methods and constructed an Adversarial Training and Bidirectional long shortterm memory-Selfattention Conditional random field(AT-BSAC)model.In our model,the mathematical text was vectorized by the word embedding technique,and small perturbations were added to the word vector to generate adversarial samples,while local features were extracted by Bi-directional Long Short-Term Memory(BiLSTM).The self-attentive mechanism was incorporated to extract more dependent features between entities.The experimental results demonstrated that the AT-BSAC model achieved a precision(P)of 93.88%,a recall(R)of 93.84%,and an F1-score of 93.74%,respectively,which is 8.73%higher than the F1-score of the previous Bi-directional Long Short-Term Memory Conditional Random Field(BiLSTM-CRF)model.The effectiveness of the proposed model in mathematical named entity recognition. 展开更多
关键词 Named entity recognition BiLSTM-CRF adversarial training selfattentive mechanism mathematical texts
在线阅读 下载PDF
Digit Recognition in Natural Scene Texts
12
作者 Shih-Wei Sun 《Journal of Electronic Science and Technology》 CAS CSCD 2017年第2期199-206,共8页
Digit recognition from a natural scene text in video surveillance/broadcasting applications is a challenging research task due to blurred, font variations, twisted, and non-uniform color distribution issues with a dig... Digit recognition from a natural scene text in video surveillance/broadcasting applications is a challenging research task due to blurred, font variations, twisted, and non-uniform color distribution issues with a digit in a natural scene to be recognized. In this paper, to solve the digit number recognition problem, a principal-axis based topology contour descriptor with support vector machine (SVM) classification is proposed. The contributions of this paper include: a) a local descriptor with SVM classification for digit recognition, b) higher accuracy than the state-of-the art methods, and c) low computational power (0.03 second/digit recognition), which make this method adoptable to real-time applications. 展开更多
关键词 Index Terms--Digit recognition scene text sports video video surveillance.
在线阅读 下载PDF
CNN and Fuzzy Rules Based Text Detection and Recognition from Natural Scenes
13
作者 T.Mithila R.Arunprakash A.Ramachandran 《Computer Systems Science & Engineering》 SCIE EI 2022年第9期1165-1179,共15页
In today’s real world, an important research part in image processing isscene text detection and recognition. Scene text can be in different languages,fonts, sizes, colours, orientations and structures. Moreover, the... In today’s real world, an important research part in image processing isscene text detection and recognition. Scene text can be in different languages,fonts, sizes, colours, orientations and structures. Moreover, the aspect ratios andlayouts of a scene text may differ significantly. All these variations appear assignificant challenges for the detection and recognition algorithms that are consideredfor the text in natural scenes. In this paper, a new intelligent text detection andrecognition method for detectingthe text from natural scenes and forrecognizingthe text by applying the newly proposed Conditional Random Field-based fuzzyrules incorporated Convolutional Neural Network (CR-CNN) has been proposed.Moreover, we have recommended a new text detection method for detecting theexact text from the input natural scene images. For enhancing the presentation ofthe edge detection process, image pre-processing activities such as edge detectionand color modeling have beenapplied in this work. In addition, we have generatednew fuzzy rules for making effective decisions on the processes of text detectionand recognition. The experiments have been directedusing the standard benchmark datasets such as the ICDAR 2003, the ICDAR 2011, the ICDAR2005 and the SVT and have achieved better detection accuracy intext detectionand recognition. By using these three datasets, five different experiments havebeen conducted for evaluating the proposed model. And also, we have comparedthe proposed system with the other classifiers such as the SVM, the MLP and theCNN. In these comparisons, the proposed model has achieved better classificationaccuracywhen compared with the other existing works. 展开更多
关键词 CRF RULES text detection text recognition natural scene images CR-CNN
在线阅读 下载PDF
An Efficient Text Recognition System from Complex Color Image for Helping the Visually Impaired Persons
14
作者 Ahmed Ben Atitallah Mohamed Amin Ben Atitallah +5 位作者 Yahia Said Mohammed Albekairi Anis Boudabous Turki MAlanazi Khaled Kaaniche Mohamed Atri 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期701-717,共17页
The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recogni... The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recognition system that allows the isolation,the extraction,and the recognition of text in the case of documents having a textured background,a degraded aspect of colors,and of poor quality,and to synthesize it into speech.This system basically consists of three algorithms:a text localization and detection algorithm based on mathematical morphology method(MMM);a text extraction algorithm based on the gamma correction method(GCM);and an optical character recognition(OCR)algorithm for text recognition.A detailed complexity study of the different blocks of this text recognition system has been realized.Following this study,an acceleration of the GCM algorithm(AGCM)is proposed.The AGCM algorithm has reduced the complexity in the text recognition system by 70%and kept the same quality of text recognition as that of the original method.To assist visually impaired persons,a graphical interface of the entire text recognition chain has been developed,allowing the capture of images from a camera,rapid and intuitive visualization of the recognized text from this image,and text-to-speech synthesis.Our text recognition system provides an improvement of 6.8%for the recognition rate and 7.6%for the F-measure relative to GCM and AGCM algorithms. 展开更多
关键词 text recognition system GCM AGCM OCR color images graphical interface
在线阅读 下载PDF
Embedded System Based Raspberry Pi 4 for Text Detection and Recognition
15
作者 Turki M.Alanazi 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3343-3354,共12页
Detecting and recognizing text from natural scene images presents a challenge because the image quality depends on the conditions in which the image is captured,such as viewing angles,blurring,sensor noise,etc.However... Detecting and recognizing text from natural scene images presents a challenge because the image quality depends on the conditions in which the image is captured,such as viewing angles,blurring,sensor noise,etc.However,in this paper,a prototype for text detection and recognition from natural scene images is proposed.This prototype is based on the Raspberry Pi 4 and the Universal Serial Bus(USB)camera and embedded our text detection and recognition model,which was developed using the Python language.Our model is based on the deep learning text detector model through the Efficient and Accurate Scene Text Detec-tor(EAST)model for text localization and detection and the Tesseract-OCR,which is used as an Optical Character Recognition(OCR)engine for text recog-nition.Our prototype is controlled by the Virtual Network Computing(VNC)tool through a computer via a wireless connection.The experiment results show that the recognition rate for the captured image through the camera by our prototype can reach 99.75%with low computational complexity.Furthermore,our proto-type is more performant than the Tesseract software in terms of the recognition rate.Besides,it provides the same performance in terms of the recognition rate with a huge decrease in the execution time by an average of 89%compared to the EasyOCR software on the Raspberry Pi 4 board. 展开更多
关键词 text detection text recognition OCR engine natural scene images Raspberry Pi USB camera
在线阅读 下载PDF
Text Augmentation-Based Model for Emotion Recognition Using Transformers
16
作者 Fida Mohammad Mukhtaj Khan +4 位作者 Safdar Nawaz Khan Marwat Naveed Jan Neelam Gohar Muhammad Bilal Amal Al-Rasheed 《Computers, Materials & Continua》 SCIE EI 2023年第9期3523-3547,共25页
Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their... Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations. 展开更多
关键词 Emotion recognition in conversation graph-based network text augmentation-basedmodel multimodal emotion lines dataset bidirectional encoder representation for transformer
在线阅读 下载PDF
An Efficient Hybrid Model for Arabic Text Recognition
17
作者 Hicham Lamtougui Hicham El Moubtahij +1 位作者 Hassan Fouadi Khalid Satori 《Computers, Materials & Continua》 SCIE EI 2023年第2期2871-2888,共18页
In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and effici... In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and efficient handwritten text recognition system remains a challenge for the research community in this field,especially for the Arabic language,which,compared to other languages,has a dearth of published works.In this work,we presented an efficient and new system for offline Arabic handwritten text recognition.Our new approach is based on the combination of a Convolutional Neural Network(CNN)and a Bidirectional Long-Term Memory(BLSTM)followed by a Connectionist Temporal Classification layer(CTC).Moreover,during the training phase of the model,we introduce an algorithm of data augmentation to increase the quality of data.Our proposed approach can recognize Arabic handwritten texts without the need to segment the characters,thus overcoming several problems related to this point.To train and test(evaluate)our approach,we used two Arabic handwritten text recognition databases,which are IFN/ENIT and KHATT.The Experimental results show that our new approach,compared to other methods in the literature,gives better results. 展开更多
关键词 Deep learning arabic handwritten text recognition convolutional neural network(CNN) bidirectional long-term memory(BLSTM) connectionist temporal classification(CTC)
在线阅读 下载PDF
Menu Text Recognition of Few-shot Learning
18
作者 Xiaoyu Tian Zhenzhen +3 位作者 Xin Zihao Liu Suolan Chen Fuhua Wang Hongyuan 《Journal of New Media》 2022年第3期137-143,共7页
Recent advances in OCR show that end-to-end(E2E)training pipelines including detection and identification can achieve the best results.However,many existing methods usually focus on case insensitive English characters... Recent advances in OCR show that end-to-end(E2E)training pipelines including detection and identification can achieve the best results.However,many existing methods usually focus on case insensitive English characters.In this paper,we apply an E2E approach,the multiplex multilingual mask TextSpotter,which performs script recognition at the word level and uses different recognition headers to process different scripts while maintaining uniform loss,thus optimizing script recognition and multiple recognition headers simultaneously.Experiments show that this method is superior to the single-head model with similar number of parameters in endto-end identification tasks. 展开更多
关键词 text recognition script identification few-shot learning multiple languages
在线阅读 下载PDF
An improved CRNN for Vietnamese Identity Card Information Recognition 被引量:2
19
作者 Trinh Tan Dat Le Tran Anh Dang +4 位作者 Nguyen Nhat Truong Pham Cung Le Thien Vu Vu Ngoc Thanh Sang Pham Thi Vuong Pham The Bao 《Computer Systems Science & Engineering》 SCIE EI 2022年第2期539-555,共17页
This paper proposes an enhancement of an automatic text recognition system for extracting information from the front side of the Vietnamese citizen identity(CID)card.First,we apply Mask-RCNN to segment and align the C... This paper proposes an enhancement of an automatic text recognition system for extracting information from the front side of the Vietnamese citizen identity(CID)card.First,we apply Mask-RCNN to segment and align the CID card from the background.Next,we present two approaches to detect the CID card’s text lines using traditional image processing techniques compared to the EAST detector.Finally,we introduce a new end-to-end Convolutional Recurrent Neural Network(CRNN)model based on a combination of Connectionist Temporal Classification(CTC)and attention mechanism for Vietnamese text recognition by jointly train the CTC and attention objective functions together.The length of the CTC’s output label sequence is applied to the attention-based decoder prediction to make the final label sequence.This process helps to decrease irregular alignments and speed up the label sequence estimation during training and inference,instead of only relying on a data-driven attention-based encoder-decoder to estimate the label sequence in long sentences.We may directly learn the proposed model from a sequence of words without detailed annotations.We evaluate the proposed system using a real collected Vietnamese CID card dataset and find that our method provides a 4.28%in WER and outperforms the common techniques. 展开更多
关键词 Vietnamese text recognition OCR CRNN BLSTM attention mechanism joint CTC-Attention
在线阅读 下载PDF
An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters 被引量:1
20
作者 Mohammed Hadwan Hamzah A.Alsayadi Salah AL-Hagree 《Computers, Materials & Continua》 SCIE EI 2023年第2期3471-3487,共17页
The attention-based encoder-decoder technique,known as the trans-former,is used to enhance the performance of end-to-end automatic speech recognition(ASR).This research focuses on applying ASR end-toend transformer-ba... The attention-based encoder-decoder technique,known as the trans-former,is used to enhance the performance of end-to-end automatic speech recognition(ASR).This research focuses on applying ASR end-toend transformer-based models for the Arabic language,as the researchers’community pays little attention to it.The Muslims Holy Qur’an book is written using Arabic diacritized text.In this paper,an end-to-end transformer model to building a robust Qur’an vs.recognition is proposed.The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework.A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model.AMel filter bank is used for feature extraction.To build a language model(LM),the Recurrent Neural Network(RNN)and Long short-term memory(LSTM)were used to train an n-gram word-based LM.As a part of this research,a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model,consisting of 10 h of.wav recitations performed by 60 reciters.The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate(CER)of 1.98%and a word error rate(WER)of 6.16%.We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters. 展开更多
关键词 Attention-based encoder-decoder recurrent neural network long short-term memory qur’an reciters recognition diacritized arabic text
在线阅读 下载PDF
上一页 1 2 50 下一页 到第
使用帮助 返回顶部