期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 被引量:4
1
作者 Liangping Ding Zhixiong Zhang +2 位作者 Huan Liu Jie Li GaihongYu 《Journal of Data and Information Science》 CSCD 2021年第3期35-57,共23页
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p... Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent. 展开更多
关键词 Automatic keyphrase extraction Character-level sequence labeling Pretrained language model Scientific chinese medical abstracts
在线阅读 下载PDF
Text Structured Algorithm of Lung Cancer Cases Based on Deep Learning
2
作者 MI Linhui YUAN Junyi +1 位作者 ZHOU Yankang HOU Xumin 《Journal of Shanghai Jiaotong university(Science)》 2025年第4期778-789,共12页
Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from l... Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from lung cancer case texts,which involves two types of text structuring tasks:attribute discrimination and attribute extraction.This article proposes a joint model,Multi-BGLC,around these two types of tasks,using bidirectional encoder representations from transformers(BERT)as the encoder and fine-tuning the decoder composed of graph convolutional neural network(GCNN)+long short-term memory(LSTM)+conditional random field(CRF)based on cancer case data.The GCNN is used for attribute discrimination,whereas the LSTM and CRF are used for attribute extraction.The experiment verified the effectiveness and accuracy of the model compared with other baseline models. 展开更多
关键词 text structuring text classification sequence labeling data augmentation lung cancer electronic medical record
原文传递
End-to-end multi-granulation causality extraction model 被引量:1
3
作者 Miao Wu Qinghua Zhang +1 位作者 Chengying Wu Guoyin Wang 《Digital Communications and Networks》 CSCD 2024年第6期1864-1873,共10页
Causality extraction has become a crucial task in natural language processing and knowledge graph.However,most existing methods divide causality extraction into two subtasks:extraction of candidate causal pairs and cl... Causality extraction has become a crucial task in natural language processing and knowledge graph.However,most existing methods divide causality extraction into two subtasks:extraction of candidate causal pairs and classification of causality.These methods result in cascading errors and the loss of associated contextual information.Therefore,in this study,based on graph theory,an End-to-end Multi-Granulation Causality Extraction model(EMGCE)is proposed to extract explicit causality and directly mine implicit causality.First,the sentences are represented on different granulation layers,that contain character,word,and contextual string layers.The word layer is fine-grained into three layers:word-index,word-embedding and word-position-embedding layers.Then,a granular causality tree of dataset is built based on the word-index layer.Next,an improved tagREtriplet algorithm is designed to obtain the labeled causality based on the granular causality tree.It can transform the task into a sequence labeling task.Subsequently,the multi-granulation semantic representation is fed into the neural network model to extract causality.Finally,based on the extended public SemEval 2010 Task 8 dataset,the experimental results demonstrate that EMGCE is effective. 展开更多
关键词 Causality extraction Granular computing Granular causality tree Semantic representation sequence labeling
在线阅读 下载PDF
Segmentation-Free Recognition Algorithm Based on Deep Learning for Handwritten Text Image
4
作者 Ge Peng 《Journal of Artificial Intelligence and Technology》 2024年第2期169-178,共10页
Segmentation-based offline handwritten character recognition algorithms suffered from the segmenting difficulty of interleaving and touching in handwritten manuscripts.To tackle the problem,a segmentation-free recogni... Segmentation-based offline handwritten character recognition algorithms suffered from the segmenting difficulty of interleaving and touching in handwritten manuscripts.To tackle the problem,a segmentation-free recognition algorithm based on deep learning network is proposed in this paper.The network consists of four neural layers,including input layer for image preprocessing,convolutional neural networks(CNNs)layer for feature extraction,bidirectional long-short term network(BDLSTM)layer for sequence prediction,and connectionist temporal classification(CTC)layer for text sequence alignment and classification.Besides,a novel data processing method is performed for data length equalization.Based on this,groups of experiments,based on six typical databases,involved in evaluation indicators of character correct rate,training time cost,storage space cost,and testing time cost are carried out.The experimental results show that the proposed algorithm has better performances in accuracy and efficiency than other classical algorithms. 展开更多
关键词 deep learning image processing segmentation-free handwritten image recognition sequence labeling
在线阅读 下载PDF
Hashtag Recommendation Using LSTM Networks with Self-Attention 被引量:2
5
作者 Yatian Shen Yan Li +5 位作者 Jun Sun Wenke Ding Xianjin Shi Lei Zhang Xiajiong Shen Jing He 《Computers, Materials & Continua》 SCIE EI 2019年第9期1261-1269,共9页
On Twitter,people often use hashtags to mark the subject of a tweet.Tweets have specific themes or content that are easy for people to manage.With the increase in the number of tweets,how to automatically recommend ha... On Twitter,people often use hashtags to mark the subject of a tweet.Tweets have specific themes or content that are easy for people to manage.With the increase in the number of tweets,how to automatically recommend hashtags for tweets has received wide attention.The previous hashtag recommendation methods were to convert the task into a multi-class classification problem.However,these methods can only recommend hashtags that appeared in historical information,and cannot recommend the new ones.In this work,we extend the self-attention mechanism to turn the hashtag recommendation task into a sequence labeling task.To train and evaluate the proposed method,we used the real tweet data which is collected from Twitter.Experimental results show that the proposed method can be significantly better than the most advanced method.Compared with the state-of-the-art methods,the accuracy of our method has been increased 4%. 展开更多
关键词 Hashtags recommendation self-attention neural networks sequence labeling
在线阅读 下载PDF
Fine-Grained Opinion Extraction from Chinese Car Reviews with an Integrated Strategy 被引量:1
6
作者 WANG Yinglin WANG Ming 《Journal of Shanghai Jiaotong university(Science)》 EI 2018年第5期620-626,共7页
With rapid development of E-commerce, a large amount of data including reviews about different types of products can be accessed within short time. On top of this, opinion mining is becoming increasingly effective to ... With rapid development of E-commerce, a large amount of data including reviews about different types of products can be accessed within short time. On top of this, opinion mining is becoming increasingly effective to extract valuable information for product design, improvement and brand marketing, especially with fine-grained opinion mining. However, limited by the unstructured and causal expression of opinions, one cannot extract valuable information conveniently. In this paper, we propose an integrated strategy to automatically extract feature-based information, with which one can easily acquire detailed opinion about certain products.For adaptation to the reviews' characteristics, our strategy is made up of a multi-label classification(MLC) for reviews, a binary classification(BC) for sentences and a sentence-level sequence labelling with a deep learning method. During experiment, our approach achieves 82% accuracy in the final sequence labelling task under the setting of a 20-fold cross validation. In addition, the strategy can be expediently employed in other reviews as long as there is an according amount of labelled data for startup. 展开更多
关键词 opinion extraction multi-label classification (MLC) binary classification (BC) sequence labelling recurrent neural network (RNN)
原文传递
MAL:multilevel active learning with BERT for Chinese textual affective structure analysis
7
作者 Shufeng XIONG Guipei ZHANG +4 位作者 Xiaobo FAN Wenjie TIAN Lei XI Hebing LIU Haiping SI 《Frontiers of Information Technology & Electronic Engineering》 2025年第6期833-846,共14页
Chinese textual affective structure analysis(CTASA)is a sequence labeling task that often relies on supervised deep learning methods.However,acquiring a large annotated dataset for training can be costly and timeconsu... Chinese textual affective structure analysis(CTASA)is a sequence labeling task that often relies on supervised deep learning methods.However,acquiring a large annotated dataset for training can be costly and timeconsuming.Active learning offers a solution by selecting the most valuable samples to reduce labeling costs.Previous approaches focused on uncertainty or diversity but faced challenges such as biased models or selecting insignificant samples.To address these issues,multilevel active learning(MAL)is introduced,which leverages deep textual information at both the sentence and word levels,taking into account the complex structure of the Chinese language.By integrating the sentence-level features extracted from bidirectional encoder representations from Transformers(BERT)embeddings and the word-level probability distributions obtained through a conditional random field(CRF)model,MAL comprehensively captures the Chinese textual affective structure(CTAS).Experimental results demonstrate that MAL significantly reduces annotation costs by approximately 70%and achieves more consistent performance compared to baseline methods. 展开更多
关键词 Sentiment analysis sequence labeling Active learning(AL) Bidirectional encoder representations from Transformers(BERT)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部