期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
MT-Oriented English PoS Tagging and Its Application to Noun Phrase Chunking
1
作者 Ma Jianjun Huang Degen +1 位作者 Liu Haixia Sheng Wenfeng 《China Communications》 SCIE CSCD 2012年第3期58-67,共10页
A hybrid approach to English Part-of-Speech(PoS) tagging with its target application being English-Chinese machine translation in business domain is presented,demonstrating how a present tagger can be adapted to learn... A hybrid approach to English Part-of-Speech(PoS) tagging with its target application being English-Chinese machine translation in business domain is presented,demonstrating how a present tagger can be adapted to learn from a small amount of data and handle unknown words for the purpose of machine translation.A small size of 998 k English annotated corpus in business domain is built semi-automatically based on a new tagset;the maximum entropy model is adopted,and rule-based approach is used in post-processing.The tagger is further applied in Noun Phrase(NP) chunking.Experiments show that our tagger achieves an accuracy of 98.14%,which is a quite satisfactory result.In the application to NP chunking,the tagger gives rise to 2.21% increase in F-score,compared with the results using Stanford tagger. 展开更多
关键词 English pos tagging maximum entro- py rule-based approach machine translation NP chunking
在线阅读 下载PDF
Unified Framework of Performing Chinese Word Segmentation and Part-of-Speech Tagging 被引量:5
2
作者 Zhang Kaixu Sun Maosong 《China Communications》 SCIE CSCD 2012年第3期1-9,共9页
The paper proposes a unified framework to combine the advantages of the fast one-at-a-time approach and the high-performance all-at-once approach to perform Chinese Word Segmentation(CWS) and Part-of-Speech(PoS) taggi... The paper proposes a unified framework to combine the advantages of the fast one-at-a-time approach and the high-performance all-at-once approach to perform Chinese Word Segmentation(CWS) and Part-of-Speech(PoS) tagging.In this framework,the input of the PoS tagger is a candidate set of several CWS results provided by the CWS model.The widely used one-at-a-time approach and all-at-once approach are two extreme cases of the proposed candidate-based approaches.Experiments on Penn Chinese Treebank 5 and Tsinghua Chinese Treebank show that the generalized candidate-based approach outperforms one-at-a-time approach and even the all-at-once approach.The candidate-based approach is also faster than the time-consuming all-at-once approach.The authors compare three different methods based on sentence,words and character-intervals to generate the candidate set.It turns out that the word-based method has the best performance. 展开更多
关键词 natural language processing Chineseword segmentation pos tagging CANDIDATE wordlattice
在线阅读 下载PDF
A Semi-Supervised Approach for Aspect Category Detection and Aspect Term Extraction from Opinionated Text 被引量:1
3
作者 Bishrul Haq Sher Muhammad Daudpota +2 位作者 Ali Shariq Imran Zenun Kastrati Waheed Noor 《Computers, Materials & Continua》 SCIE EI 2023年第10期115-137,共23页
The Internet has become one of the significant sources for sharing information and expressing users’opinions about products and their interests with the associated aspects.It is essential to learn about product revie... The Internet has become one of the significant sources for sharing information and expressing users’opinions about products and their interests with the associated aspects.It is essential to learn about product reviews;however,to react to such reviews,extracting aspects of the entity to which these reviews belong is equally important.Aspect-based Sentiment Analysis(ABSA)refers to aspects extracted from an opinionated text.The literature proposes different approaches for ABSA;however,most research is focused on supervised approaches,which require labeled datasets with manual sentiment polarity labeling and aspect tagging.This study proposes a semisupervised approach with minimal human supervision to extract aspect terms by detecting the aspect categories.Hence,the study deals with two main sub-tasks in ABSA,named Aspect Category Detection(ACD)and Aspect Term Extraction(ATE).In the first sub-task,aspects categories are extracted using topic modeling and filtered by an oracle further,and it is fed to zero-shot learning as the prompts and the augmented text.The predicted categories are the input to find similar phrases curated with extracting meaningful phrases(e.g.,Nouns,Proper Nouns,NER(Named Entity Recognition)entities)to detect the aspect terms.The study sets a baseline accuracy for two main sub-tasks in ABSA on the Multi-Aspect Multi-Sentiment(MAMS)dataset along with SemEval-2014 Task 4 subtask 1 to show that the proposed approach helps detect aspect terms via aspect categories. 展开更多
关键词 Natural language processing sentiment analysis aspect-based sentiment analysis topic-modeling pos tagging zero-shot learning
在线阅读 下载PDF
X-News dataset for online news categorization
4
作者 Samia Nawaz Yousafzai Hooria Shahbaz +4 位作者 Armughan Ali Amreen Qamar Inzamam Mashood Nasir Sara Tehsin Robertas Damasevicius 《International Journal of Intelligent Computing and Cybernetics》 2024年第4期737-758,共22页
Purpose-The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning(DL)techniques.A distributed framework utilizing B... Purpose-The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning(DL)techniques.A distributed framework utilizing Bidirectional Encoder Representations from Transformers(BERT)was developed to classify news headlines.This approach leverages various text mining and DL techniques on a distributed infrastructure,aiming to offer an alternative to traditional news classification methods.Design/methodology/approach-This study focuses on the classification of distinct types of news by analyzing tweets from various news channels.It addresses the limitations of using benchmark datasets for news classification,which often result in models that are impractical for real-world applications.Findings-The framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository,assessing the performance of each text mining and classification method across these datasets.The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time.This indicates that the distributed framework,coupled with the use of BERT for text analysis,provides a robust solution for analyzing large volumes of data efficiently.The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification,suggesting its potential to facilitate advancements in these areas.Originality/value-This research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets.By utilizing cutting-edge techniques and a novel dataset,the study offers significant improvements in accuracy and processing speed.The release of the corpus represents a valuable contribution to the field,enabling further exploration into news and emotion classification.This work sets a new standard for the analysis of news data,offering practical implications for the development of more effective and efficient news classification systems. 展开更多
关键词 News categorization BERT classifier pos tagging Social media analytics Deep learning for text analysis Sentiment analysis
在线阅读 下载PDF
Chinese New Word Identification:A Latent Discriminative Model with Global Features 被引量:11
5
作者 孙晓 黄德根 +1 位作者 宋海玉 任福继 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第1期14-24,共11页
Chinese new words are particularly problematic in Chinese natural language processing. With the fast development of Internet and information explosion, it is impossible to get a complete system lexicon for application... Chinese new words are particularly problematic in Chinese natural language processing. With the fast development of Internet and information explosion, it is impossible to get a complete system lexicon for applications in Chinese natural language processing, as new words out of dictionaries are always being created. The procedure of new words identification and POS tagging are usually separated and the features of lexical information cannot be fully used. A latent discriminative model, which combines the strengths of Latent Dynamic Conditional Random Field (LDCRF) and semi-CRF, is proposed to detect new words together with their POS synchronously regardless of the types of new words from Chinese text without being pre-segmented. Unlike semi-CRF, in proposed latent discriminative model, LDCRF is applied to generate candidate entities, which accelerates the training speed and decreases the computational cost. The complexity of proposed hidden semi-CRF could be further adjusted by tuning the number of hidden variables and the number of candidate entities from the Nbest outputs of LDCRF model. A new-word-generating framework is proposed for model training and testing, under which the definitions and distributions of new words conform to the ones in real text. The global feature called "Global Fragment Features" for new word identification is adopted. We tested our model on the corpus from SIGHAN-6. Experimental results show that the proposed method is capable of detecting even low frequency new words together with their POS tags with satisfactory results. The proposed model performs competitively with the state-of-the-art models. 展开更多
关键词 new word identification new words pos tagging conditional random fields hidden semi-CRF global fragment features
原文传递
Pretrained Models and Evaluation Data for the Khmer Language
6
作者 Shengyi Jiang Sihui Fu +1 位作者 Nankai Lin Yingwen Fu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期709-718,共10页
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(N... Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications. 展开更多
关键词 pretrained models Khmer language word segmentation part-of-speech(pos)tagging news categorization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部