Lexical analysis is a fundamental task in natural language processing,which involves several subtasks,such as word segmentation(WS),part-of-speech(POS)tagging,and named entity recognition(NER).Recent works have shown ...Lexical analysis is a fundamental task in natural language processing,which involves several subtasks,such as word segmentation(WS),part-of-speech(POS)tagging,and named entity recognition(NER).Recent works have shown that taking advantage of relatedness between these subtasks can be beneficial.This paper proposes a unified neural framework to address these subtasks simultaneously.Apart from the sequence tagging paradigm,the proposed method tackles the multitask lexical analysis via two-stage sequence span classification.Firstly,the model detects the word and named entity boundaries by multilabel classification over character spans in a sentence.Then,the authors assign POS labels and entity labels for words and named entities by multi-class classification,respectively.Furthermore,a Gated Task Transformation(GTT)is proposed to encourage the model to share valuable features between tasks.The performance of the proposed model was evaluated on Chinese and Thai public datasets,demonstrating state-of-the-art results.展开更多
基金supported by National Natural Science Foundation of China(Grant No.62266028,62266027,U21B2027,and U24A20334)Major Science and Technology Programs in Yunnan Province(Grant No.202302AD080003,202402AG050007,and 202303AP140008)+1 种基金Yunnan Province Basic Research Program(Grant No.202301AS070047,202301AT070471,and 202401BC070021)Kunming University of Science and Technology's"Double First-rate"construction joint project(Grant No.202201BE070001-021).
文摘Lexical analysis is a fundamental task in natural language processing,which involves several subtasks,such as word segmentation(WS),part-of-speech(POS)tagging,and named entity recognition(NER).Recent works have shown that taking advantage of relatedness between these subtasks can be beneficial.This paper proposes a unified neural framework to address these subtasks simultaneously.Apart from the sequence tagging paradigm,the proposed method tackles the multitask lexical analysis via two-stage sequence span classification.Firstly,the model detects the word and named entity boundaries by multilabel classification over character spans in a sentence.Then,the authors assign POS labels and entity labels for words and named entities by multi-class classification,respectively.Furthermore,a Gated Task Transformation(GTT)is proposed to encourage the model to share valuable features between tasks.The performance of the proposed model was evaluated on Chinese and Thai public datasets,demonstrating state-of-the-art results.