期刊文献+
共找到28篇文章
< 1 2 >
每页显示 20 50 100
Chinese spoken language understanding in SHTQS
1
作者 毛家菊 郭荣 陆汝占 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第2期225-230,共6页
Spoken dialogue systems are an active research field with wide applications. But the differences in the Chinese spoken dialogue system are not as distinct as that of English. In Chinese spoken dialogues, there are man... Spoken dialogue systems are an active research field with wide applications. But the differences in the Chinese spoken dialogue system are not as distinct as that of English. In Chinese spoken dialogues, there are many language phenomena. Firstly, most utterances are ill-formed. Secondly, ellipsis, anaphora and negation are also widely used in Chinese spoken dialogue. Determining how to extract semantic information from incomplete sentences and resolve negation, anaphora and ellipsis is crucial. SHTQS (Shanghai Transportation Query System) is an intelligent telephone-based spoken dialogue system providing information about the best route between any two sites in Shanghai. After a brief description of the system, the natural language processing is emphasized. Speech recognition sentences unavoidably contain errors. In language sequence processing procedures, these errors can be easily passed to the later parts and take on a ripple effect. To detect and recover these from errors as early as possible, language-processing strategies are specially considered. For errors resulting from divided words in speech recognition, segmentation and POS Tagging approaches that can rectify these errors are designed. Since most of the inquiry utterances are ill-formed and negation, anaphora and ellipsis are common language phenomena, the language understanding must be adequately adaptive. So, a partial syntactic parsing scheme is adopted and a chart algorithm is used. The parser is based on unification grammar. The semantic frame that extracts from the best arc set of the chart is used to represent the meaning of sentences. The negation, anaphora and ellipsis are also analyzed and corresponding processing approaches are presented. The accuracy of the language processing part is 88.39% and the testing result shows that the language processing strategies are rational and effective. 展开更多
关键词 spoken dialogue system natural language understanding syntactic parsing
在线阅读 下载PDF
Evaluation on ChatGPT for Chinese Language Understanding 被引量:6
2
作者 Linhan Li Huaping Zhang +2 位作者 Chunjin Li Haowen You Wenyao Cui 《Data Intelligence》 EI 2023年第4期885-903,共19页
ChatGPT has attracted extension attention of academia and industry.This paper aims to evaluate ChatGPT in Chinese language understanding capability on 6 tasks using 11 datasets.Experiments indicate that ChatGPT achiev... ChatGPT has attracted extension attention of academia and industry.This paper aims to evaluate ChatGPT in Chinese language understanding capability on 6 tasks using 11 datasets.Experiments indicate that ChatGPT achieved competitive results in sentiment analysis,summary,and reading comprehension in Chinese,while it is prone to factual errors in closed-book QA.Further,on two more difficult Chinese understanding tasks,that is,idiom fill-in-the-blank and cants understanding,we found that a simple chain-of-thought prompt can improve the accuracy of ChatGPT in complex reasoning.This paper further analyses the possible risks of using ChatGPT based on the results.Finally,we briefly describe the research and development progress of our ChatBIT. 展开更多
关键词 language Model ChatGPT ChatBIT Chinese language understanding Artificial intelligence
原文传递
Linguistic Hypotheses Concerning Natural Language Understanding 被引量:1
3
作者 袁毓林 《Social Sciences in China》 1995年第4期131-142,218,共13页
关键词 ROCK Linguistic Hypotheses Concerning Natural language understanding
原文传递
The research and realization about automatic abstracting based on text clustering and natural language understanding
4
作者 GUO Qing-lin FAN Xiao-zhong LIU Chang-an 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2006年第4期460-464,共5页
A method of realization of automatic abstracting based on text clustering and natural language understanding is explored, aimed at overcoming shortages of some current methods. The method makes use of text clustering ... A method of realization of automatic abstracting based on text clustering and natural language understanding is explored, aimed at overcoming shortages of some current methods. The method makes use of text clustering and can realize automatic abstracting of multi-documents. The algo- rithm of twice word segmentation based on the title and first sentences in paragraphs is investigated. Its precision and recall is above 95 %. For a specific domain on plastics, an automatic abstracting system named TCAAS is implemented. The precision and recall of multi-document’s automatic ab- stracting is above 75 %. Also, the experiments prove that it is feasible to use the method to develop a domain automatic abstracting system, which is valuable for further in-depth study. 展开更多
关键词 automatic abstracting text clustering natural language understanding
原文传递
Classification of Conversational Sentences Using an Ensemble Pre-Trained Language Model with the Fine-Tuned Parameter
5
作者 R.Sujatha K.Nimala 《Computers, Materials & Continua》 SCIE EI 2024年第2期1669-1686,共18页
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir... Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88. 展开更多
关键词 Bidirectional encoder for representation of transformer conversation ensemble model fine-tuning generalized autoregressive pretraining for language understanding generative pre-trained transformer hyperparameter tuning natural language processing robustly optimized BERT pretraining approach sentence classification transformer models
在线阅读 下载PDF
Recommender System for Information Retrieval Using Natural Language Querying Interface Based in Bibliographic Research for Naïve Users
6
作者 Mohamed Chakraoui Abderrafiaa Elkalay Naoual Mouhni 《International Journal of Intelligence Science》 2022年第1期9-20,共12页
With the increasing of data on the internet, data analysis has become inescapable to gain time and efficiency, especially in bibliographic information retrieval systems. We can estimate the number of actual scientific... With the increasing of data on the internet, data analysis has become inescapable to gain time and efficiency, especially in bibliographic information retrieval systems. We can estimate the number of actual scientific journals points to around 40</span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">,</span></span></span><span><span><span style="font-family:""><span style="font-family:Verdana;">000 with about four million articles published each year. Machine learning and deep learning applied to recommender systems had become unavoidable whether in industry or in research. In this current, we propose an optimized interface for bibliographic information retrieval as a </span><span style="font-family:Verdana;">running example, which allows different kind of researchers to find their</span><span style="font-family:Verdana;"> needs following some relevant criteria through natural language understanding. Papers indexed in Web of Science and Scopus are in high demand. Natural language including text and linguistic-based techniques, such as tokenization, named entity recognition, syntactic and semantic analysis, are used to express natural language queries. Our Interface uses association rules to find more related papers for recommendation. Spanning trees are challenged to optimize the search process of the system. 展开更多
关键词 Recommender Systems Collaborative Filtering Apriori Algorithm Natural language understanding Bibliographic Research
在线阅读 下载PDF
A Natural Language Generation Algorithm for Greek by Using Hole Semantics and a Systemic Grammatical Formalism
7
作者 Ioannis Giachos Eleni Batzaki +2 位作者 Evangelos C.Papakitsos Stavros Kaminaris Nikolaos Laskaris 《Journal of Computer Science Research》 2023年第4期27-37,共11页
This work is about the progress of previous related work based on an experiment to improve the intelligence of robotic systems,with the aim of achieving more linguistic communication capabilities between humans and ro... This work is about the progress of previous related work based on an experiment to improve the intelligence of robotic systems,with the aim of achieving more linguistic communication capabilities between humans and robots.In this paper,the authors attempt an algorithmic approach to natural language generation through hole semantics and by applying the OMAS-III computational model as a grammatical formalism.In the original work,a technical language is used,while in the later works,this has been replaced by a limited Greek natural language dictionary.This particular effort was made to give the evolving system the ability to ask questions,as well as the authors developed an initial dialogue system using these techniques.The results show that the use of these techniques the authors apply can give us a more sophisticated dialogue system in the future. 展开更多
关键词 Natural language processing Natural language generation Natural language understanding Dialog system Systemic grammar formalism OMAS-III HRI Virtual assistant Hole semantics
在线阅读 下载PDF
A review of transformer models in drug discovery and beyond 被引量:1
8
作者 Jian Jiang Long Chen +7 位作者 Lu Ke Bozheng Dou Chunhuan Zhang Hongsong Feng Yueying Zhu Huahai Qiu Bengong Zhang Guo-Wei Wei 《Journal of Pharmaceutical Analysis》 2025年第6期1187-1201,共15页
Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the... Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the innate capabilities of transformer architectures to comprehend intricate hierarchical dependencies inherent in sequential data,these models showcase remarkable efficacy across various tasks,including new drug design and drug target identification.The adaptability of pre-trained trans-former-based models renders them indispensable assets for driving data-centric advancements in drug discovery,chemistry,and biology,furnishing a robust framework that expedites innovation and dis-covery within these domains.Beyond their technical prowess,the success of transformer-based models in drug discovery,chemistry,and biology extends to their interdisciplinary potential,seamlessly combining biological,physical,chemical,and pharmacological insights to bridge gaps across diverse disciplines.This integrative approach not only enhances the depth and breadth of research endeavors but also fosters synergistic collaborations and exchange of ideas among disparate fields.In our review,we elucidate the myriad applications of transformers in drug discovery,as well as chemistry and biology,spanning from protein design and protein engineering,to molecular dynamics(MD),drug target iden-tification,transformer-enabled drug virtual screening(VS),drug lead optimization,drug addiction,small data set challenges,chemical and biological image analysis,chemical language understanding,and single cell data.Finally,we conclude the survey by deliberating on promising trends in transformer models within the context of drug discovery and other sciences. 展开更多
关键词 TRANSFORMER Drug discovery Chemical language understanding Molecular dynamics Protein design
在线阅读 下载PDF
The Joint Model of Multi-Intent Detection and Slot Filling Based on Bidirectional Interaction Structure
9
作者 WANG Changjing ZENG Xianghui +2 位作者 WANG Yuxin SUN Yuxin ZUO Zhengkang 《Wuhan University Journal of Natural Sciences》 2025年第1期21-31,共11页
Intent detection and slot filling are two important components of natural language understanding.Because their relevance,joint training is often performed to improve performance.Existing studies mostly use a joint mod... Intent detection and slot filling are two important components of natural language understanding.Because their relevance,joint training is often performed to improve performance.Existing studies mostly use a joint model of multi-intent detection and slot-filling with unidirectional interaction,which improves the overall performance of the model by fusing the intent information in the slot-filling part.On this basis,in order to further improve the overall performance of the model by exploiting the correlation between the two,this paper proposes a joint multi-intent detection and slot-filling model based on a bidirectional interaction structure,which fuses the intent encoding information in the encoding part of slot filling and fuses the slot decoding information in the decoding part of intent detection.Experimental results on two public multi-intent joint training datasets,MixATIS and MixSNIPS,show that the bidirectional interaction structure proposed in this paper can effectively improve the performance of the joint model.In addition,in order to verify the generalization of the bidirectional interaction structure between intent and slot,a joint model for single-intent scenarios is proposed on the basis of the model in this paper.This model also achieves excellent performance on two public single-intent joint training datasets,CAIS and SNIPS. 展开更多
关键词 natural language understanding multi-intent detection slot filling bidirectional interaction joint training
原文传递
Semi-Supervised New Intention Discovery for Syntactic Elimination and Fusion in Elastic Neighborhoods
10
作者 Di Wu Liming Feng Xiaoyu Wang 《Computers, Materials & Continua》 2025年第4期977-999,共23页
Semi-supervised new intent discovery is a significant research focus in natural language understanding.To address the limitations of current semi-supervised training data and the underutilization of implicit informati... Semi-supervised new intent discovery is a significant research focus in natural language understanding.To address the limitations of current semi-supervised training data and the underutilization of implicit information,a Semi-supervised New Intent Discovery for Elastic Neighborhood Syntactic Elimination and Fusion model(SNID-ENSEF)is proposed.Syntactic elimination contrast learning leverages verb-dominant syntactic features,systematically replacing specific words to enhance data diversity.The radius of the positive sample neighborhood is elastically adjusted to eliminate invalid samples and improve training efficiency.A neighborhood sample fusion strategy,based on sample distribution patterns,dynamically adjusts neighborhood size and fuses sample vectors to reduce noise and improve implicit information utilization and discovery accuracy.Experimental results show that SNID-ENSEF achieves average improvements of 0.88%,1.27%,and 1.30%in Normalized Mutual Information(NMI),Accuracy(ACC),and Adjusted Rand Index(ARI),respectively,outperforming PTJN,DPN,MTP-CLNN,and DWG models on the Banking77,StackOverflow,and Clinc150 datasets.The code is available at https://github.com/qsdesz/SNID-ENSEF,accessed on 16 January 2025. 展开更多
关键词 Natural language understanding semi-supervised new intent discovery syntactic elimination contrast learning neighborhood sample fusion strategies bidirectional encoder representations from transformers(BERT)
在线阅读 下载PDF
Exploring Latent Semantic Information for Textual Emotion Recognition in Blog Articles 被引量:3
11
作者 Xin Kang Fuji Ren Yunong Wu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2018年第1期204-216,共13页
Understanding people's emotions through natural language is a challenging task for intelligent systems based on Internet of Things(Io T). The major difficulty is caused by the lack of basic knowledge in emotion ex... Understanding people's emotions through natural language is a challenging task for intelligent systems based on Internet of Things(Io T). The major difficulty is caused by the lack of basic knowledge in emotion expressions with respect to a variety of real world contexts. In this paper, we propose a Bayesian inference method to explore the latent semantic dimensions as contextual information in natural language and to learn the knowledge of emotion expressions based on these semantic dimensions. Our method synchronously infers the latent semantic dimensions as topics in words and predicts the emotion labels in both word-level and document-level texts. The Bayesian inference results enable us to visualize the connection between words and emotions with respect to different semantic dimensions. And by further incorporating a corpus-level hierarchy in the document emotion distribution assumption, we could balance the document emotion recognition results and achieve even better word and document emotion predictions. Our experiment of the wordlevel and the document-level emotion predictions, based on a well-developed Chinese emotion corpus Ren-CECps, renders both higher accuracy and better robustness in the word-level and the document-level emotion predictions compared to the state-of-theart emotion prediction algorithms. 展开更多
关键词 Bayesian inference emotion-topic model emotion recognition multi-label classification natural language understanding
在线阅读 下载PDF
The Impact of Semi-Supervised Learning on the Performance of Intelligent Chatbot System 被引量:1
12
作者 Sudan Prasad Uprety Seung Ryul Jeong 《Computers, Materials & Continua》 SCIE EI 2022年第5期3937-3952,共16页
Artificial intelligent based dialog systems are getting attention from both business and academic communities.The key parts for such intelligent chatbot systems are domain classification,intent detection,and named ent... Artificial intelligent based dialog systems are getting attention from both business and academic communities.The key parts for such intelligent chatbot systems are domain classification,intent detection,and named entity recognition.Various supervised,unsupervised,and hybrid approaches are used to detect each field.Such intelligent systems,also called natural language understanding systems analyze user requests in sequential order:domain classification,intent,and entity recognition based on the semantic rules of the classified domain.This sequential approach propagates the downstream error;i.e.,if the domain classification model fails to classify the domain,intent and entity recognition fail.Furthermore,training such intelligent system necessitates a large number of user-annotated datasets for each domain.This study proposes a single joint predictive deep neural network framework based on long short-term memory using only a small user-annotated dataset to address these issues.It investigates value added by incorporating unlabeled data from user chatting logs into multi-domain spoken language understanding systems.Systematic experimental analysis of the proposed joint frameworks,along with the semi-supervised multi-domain model,using open-source annotated and unannotated utterances shows robust improvement in the predictive performance of the proposed multi-domain intelligent chatbot over a base joint model and joint model based on adversarial learning. 展开更多
关键词 Chatbot dialog system joint learning LSTM natural language understanding semi-supervised learning
在线阅读 下载PDF
Recent Advances on Human-Computer Dialogue 被引量:6
13
作者 Xiaojie Wang Caixia Yuan 《CAAI Transactions on Intelligence Technology》 2016年第4期303-312,共10页
Human-Computer dialogue systems provide a natural language based interface between human and computers. They are widely demanded in network information services, intelligent accompanying robots, and so on. A Human-Com... Human-Computer dialogue systems provide a natural language based interface between human and computers. They are widely demanded in network information services, intelligent accompanying robots, and so on. A Human-Computer dialogue system typically consists of three parts, namely Natural Language Understanding (NLU), Dialogue Management (DM) and Natural Language Generation (NLG). Each part has several different subtasks. Each subtask has been received lots of attentions, many improvements have been achieved on each subtask, respectively. But systems built in traditional pipeline way, where different subtasks are assembled sequently, suffered from some problems such as error accu- mulation and expanding, domain transferring. Therefore, researches on jointly modeling several subtasks in one part or cross different parts have been prompted greatly in recent years, especially the rapid developments on deep neural networks based joint models. There is even a few work aiming to integrate all subtasks of a dialogue system in a single model, namely end-to-end models. This paper introduces two basic frames of current dialogue systems and gives a brief survey on recent advances on variety subtasks at first, and then focuses on joint models for multiple subtasks of dialogues. We review several different joint models including integration of several subtasks inside NLU or NLG, jointly modeling cross NLG and DM, and jointly modeling through NLU, DM and NLG. Both advantages and problems of those joint models are discussed. We consider that the joint models, or end-to-end models, will be one important trend for developing Human-Computer dialogue systems. 展开更多
关键词 Human-Computer dialogue system Natural language understanding Dialogue Management Natural language Generation Joint model
在线阅读 下载PDF
3D Model Reconstruction Based on Process Information 被引量:1
14
作者 SHI Yun-fei ZHANG Shu-sheng CAO Ju-lu FAN Hai-tao YANG Yan 《Computer Aided Drafting,Design and Manufacturing》 2007年第2期15-22,共8页
The traditional strategy of 3D model reconstruction mainly concentrates on orthographic projections or engineering drawings. But there are some shortcomings. Such as, only few kinds of solids can be reconstructed, the... The traditional strategy of 3D model reconstruction mainly concentrates on orthographic projections or engineering drawings. But there are some shortcomings. Such as, only few kinds of solids can be reconstructed, the high complexity of time and less information about the 3D model. The research is extended and process card is treated as part of the 3D reconstruction. A set of process data is a superset of 2D engineering drawings set. The set comprises process drawings and process steps, and shows a sequencing and asymptotic course that a part is made from roughcast blank to final product. According to these characteristics, the object to be reconstructed is translated from the complicated engineering drawings into a series of much simpler process drawings. With the plentiful process information added for reconstruction, the disturbances such as irrelevant graph, symbol and label, etc. can be avoided. And more, the form change of both neighbor process drawings is so little that the engineering drawings interpretation has no difficulty; in addition, the abnormal solution and multi-solution can be avoided during reconstruction, and the problems of being applicable to more objects is solved ultimately. Therefore, the utility method for 3D reconstruction model will be possible. On the other hand, the feature information in process cards is provided for reconstruction model. Focusing on process cards, the feasibility and requirements of Working Procedure Model reconstruction is analyzed, and the method to apply and implement the Natural Language Understanding into the 3D reconstruction is studied. The method of asymptotic approximation product was proposed, by which a 3D process model can be constructed automatically and intelligently. The process model not only includes the information about parts characters, but also can deliver the information of design, process and engineering to the downstream applications. 展开更多
关键词 3D model reconstruction natural language understanding process cards working procedure model feature model
在线阅读 下载PDF
SHTQS: a telephonebased Chinese spoken dialogue system
15
作者 Mao Jiaju Chen Qiulin Gao Feng Guo Rong Lu Ruzhan 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第4期881-885,共5页
SHTQS is an intelligent telephone-besed spoken dialyze system providing the infomation about the best route between two sites in Shanghai. Instead of separated parts of speech decoding and language parsing, a close co... SHTQS is an intelligent telephone-besed spoken dialyze system providing the infomation about the best route between two sites in Shanghai. Instead of separated parts of speech decoding and language parsing, a close cool,ration is carded out in SHTQS by integrating automatic speech recognizer (AS,R), language understanding, dialogue management and speech generatot. In such a way, the erroneous analysis and uncertainty happening in the preceding stages would be recovered and determined acourately with high-level knowledge, Moreover, instead of shallow word-level analysis or simply keyword or key phrase matching, a deeper analysis is performed in our system by integrating a robust parser and a semantic interpreter. The robust parser is particularly important for spontanecos speech inputs because most of the inquiry sentences/phrases are ill-formed. In addition, in designinga mixed-initiative dialogue system, understanding users' inquiries is essential; however, simply matching keywords and/or key phrases can hardly achieve this. Therefore, a semantic interpreter is incorporated in oar system. The performnce of is also evaluated. The dialogue efficiency is 4.4 sentences per query on an average and the case precision rate of language understanding module is up to 81%. The results are satisfactory. 展开更多
关键词 spoken dialogue system ASR natural language understanding NLG TTS.
在线阅读 下载PDF
Mining the Chatbot Brain to Improve COVID-19 Bot Response Accuracy
16
作者 Mukhtar Ghaleb Yahya Almurtadha +5 位作者 Fahad Algarni Monir Abdullah Emad Felemban Ali M.Alsharafi Mohamed Othman Khaled Ghilan 《Computers, Materials & Continua》 SCIE EI 2022年第2期2619-2638,共20页
People often communicate with auto-answering tools such as conversational agents due to their 24/7 availability and unbiased responses.However,chatbots are normally designed for specific purposes and areas of experien... People often communicate with auto-answering tools such as conversational agents due to their 24/7 availability and unbiased responses.However,chatbots are normally designed for specific purposes and areas of experience and cannot answer questions outside their scope.Chatbots employ Natural Language Understanding(NLU)to infer their responses.There is a need for a chatbot that can learn from inquiries and expand its area of experience with time.This chatbot must be able to build profiles representing intended topics in a similar way to the human brain for fast retrieval.This study proposes a methodology to enhance a chatbot’s brain functionality by clustering available knowledge bases on sets of related themes and building representative profiles.We used a COVID-19 information dataset to evaluate the proposed methodology.The pandemic has been accompanied by an“infodemic”of fake news.The chatbot was evaluated by a medical doctor and a public trial of 308 real users.Evaluationswere obtained and statistically analyzed tomeasure effectiveness,efficiency,and satisfaction as described by the ISO9214 standard.The proposed COVID-19 chatbot system relieves doctors from answering questions.Chatbots provide an example of the use of technology to handle an infodemic. 展开更多
关键词 Machine learning text classification e-health chatbot COVID-19 awareness natural language understanding
在线阅读 下载PDF
Baseline Isolated Printed Text Image Database for Pashto Script Recognition
17
作者 Arfa Siddiqu Abdul Basit +3 位作者 Waheed Noor Muhammad Asfandyar Khan M.Saeed H.Kakar Azam Khan 《Intelligent Automation & Soft Computing》 SCIE 2023年第7期875-885,共11页
The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the... The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language processing.Realizing that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language understanding.Keeping in view the complete and full autonomous recognition of the cursive Pashto script.The first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto script.In this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been introduced.In order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each character.The dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)database.The benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats. 展开更多
关键词 Text-image database optical character recognition(OCR) pashto isolated characters visual recognition autonomous language understanding deep learning convolutional neural network(CNN)
在线阅读 下载PDF
An Open-Source Large Language Model for Chinese Education Research 被引量:1
18
作者 Wentao Liu Hao Hao Aimin Zhou 《Frontiers of Digital Education》 2025年第2期117-124,共8页
Open-source large language models(LLMs)research has made significant progress,but most studies predominantly focus on general-purpose English data,which poses challenges for LLM research in Chinese education.To addres... Open-source large language models(LLMs)research has made significant progress,but most studies predominantly focus on general-purpose English data,which poses challenges for LLM research in Chinese education.To address this,this research first reviewed and synthesized the core technologies of representative open-source LLMs,and designed an advanced 1.5Bparameter LLM tailored for the Chinese education field.Chinese education large language model(CELLM)is trained from scratch,involving two stages,namely,pretraining and instruction fine-tuning.In the pre-training phase,an open-source dataset is utilized for the Chinese education domain.During the instruction fine-tuning stage,the Chinese instruction dataset is developed and open-sourced,comprising over 258,000 data entries.Finally,the results and analysis of CELLM across multiple evaluation datasets are presented,which provides a reference baseline performance for future research.All of the models,data,and codes are opensource to foster community research on LLMs in the Chinese education domain. 展开更多
关键词 opensource Chinese educationlarge language models Chinese education research excelling extension large language model measuring massive multitask language understanding
在线阅读 下载PDF
Microblog Summarization via Enriching Contextual Features Based on Sentence-Level Semantic Analysis 被引量:1
19
作者 Senlin Luo Qianrou Chen +2 位作者 Jia Guo Ji Zhang Limin Pan 《Journal of Beijing Institute of Technology》 EI CAS 2017年第4期505-516,共12页
A novel microblog summarization approach via enriching contextual features on sentencelevel semantic analysis is proposed in this paper. At first,a Chinese sentential semantic model( CSM) is employed to analyze the ... A novel microblog summarization approach via enriching contextual features on sentencelevel semantic analysis is proposed in this paper. At first,a Chinese sentential semantic model( CSM) is employed to analyze the semantic structure of each microblog sentence. Then,a combination of sentence-level semantic analysis and latent dirichlet allocation is utilized to acquire extra features and related words to enrich the collection of microblog messages. The simlilarites between the two sentences are calculated based on the enriched features. Finally,the semantic weight and relation weight are calculated to select the most informative sentences,which form the final summary for microblog messages. Experimental results demonstrate the advantages of our proposed approach.The results indicate that introducing sentence-level semantic analysis for context enrichment can better represent sentential semantic. The proposed criteria,namely,semantic weight and relation weight enhance summary result. Furthermore,CSM is a useful framework for sentence-level semantic analysis. 展开更多
关键词 microblog summariztion language models language parsing and understanding natural language processing
在线阅读 下载PDF
New Perspectives of Stylistic Study 被引量:1
20
作者 RENYue-shu RENAi-shu WANGHua-min 《Journal of Northeast Agricultural University(English Edition)》 CAS 2005年第1期94-96,共3页
As an open and ever-developing discipline with its integrated theoretical system and applied principles, stylistics is exerting a stronger influence on more fields. The author here attempts to talk about the studies o... As an open and ever-developing discipline with its integrated theoretical system and applied principles, stylistics is exerting a stronger influence on more fields. The author here attempts to talk about the studies of stylistics from the perspective of cognitive linguistics, applied linguistics, corpus linguistics and pragmatics respectively to make a presentation of its great vitality. 展开更多
关键词 STYLISTICS expression and understanding of language computer science CORPUS sharing of human civilization
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部