Event extraction is an important research point in information extraction, which includes two important sub-tasks of event type recognition and event argument recognition. This paper describes a method based on automa...Event extraction is an important research point in information extraction, which includes two important sub-tasks of event type recognition and event argument recognition. This paper describes a method based on automatic expansion of the event triggers for event type recognition. The event triggers are first extended through a thesaurus to enable the extraction of the candidate events and their candidate types. Then, a binary classification method is used to recognize the candidate event types. This method effectively improves the unbalanced data problem in training models and the data sparseness problem with a small corpus. Evaluations on the ACE2005 dataset give a final F-score of 61.24%, which outperforms traditional methods based on pure machine learning.展开更多
As a subtask of open domain event extraction(ODEE),new event type induction aims to discover a set of unseen event types from a given corpus.Existing methods mostly adopt semi-supervised or unsupervised learning to ac...As a subtask of open domain event extraction(ODEE),new event type induction aims to discover a set of unseen event types from a given corpus.Existing methods mostly adopt semi-supervised or unsupervised learning to achieve the goal,which uses complex and different objective functions for labeled and unlabeled data respectively.In order to unify and simplify objective functions,a reliable pseudo-labeling prediction(RPP)framework for new event type induction was proposed.The framework introduces a double label reassignment(DLR)strategy for unlabeled data based on swap-prediction.DLR strategy can alleviate the model degeneration caused by swap-predication and further combine the real distribution over unseen event types to produce more reliable pseudo labels for unlabeled data.The generated reliable pseudo labels help the overall model be optimized by a unified and simple objective.Experiments show that RPP framework outperforms the state-of-the-art on the benchmark.展开更多
针对现有的类案检索(LCR)方法缺乏对案情要素的有效利用而容易被案例内容的语义结构相似性误导的问题,提出一种融合时序行为链与事件类型的类案检索方法。首先,采取序列标注的方法识别案情描述中的法律事件类型,并利用案例文本中的行为...针对现有的类案检索(LCR)方法缺乏对案情要素的有效利用而容易被案例内容的语义结构相似性误导的问题,提出一种融合时序行为链与事件类型的类案检索方法。首先,采取序列标注的方法识别案情描述中的法律事件类型,并利用案例文本中的行为要素构建时序行为链,以突出案情的关键要素,从而使模型聚焦于案例的核心内容,进而解决现有方法易被案例内容的语义结构相似性误导的问题;其次,利用分段编码构造时序行为链的相似性向量表征矩阵,从而增强案例间行为要素的语义交互;最后,通过聚合评分器,从时序行为链、法律事件类型、犯罪类型这3个角度衡量案例的相关性,从而增加案例匹配得分的合理性。实验结果表明,相较于SAILER(Structure-Aware pre-traIned language model for LEgal case Retrieval)方法,所提方法在LeCaRD(Legal Case Retrieval Dataset)上的P@5值提升了4个百分点、P@10值提升了3个百分点、MAP值提升了4个百分点,而NDCG@30值提升了0.8个百分点。可见,该方法能有效利用案情要素来避免案例内容的语义结构相似性的干扰,并能为类案检索提供可靠的依据。展开更多
事件类型归纳能够从无标注文本中自动发现并命名新事件类型,可以有效获取多个领域的事件知识。现有研究将所有样本视为单一事件样本,仅考虑样本包含的某个事件类型,忽略了多事件样本对事件语义学习和事件类型命名的负面影响。针对上述问...事件类型归纳能够从无标注文本中自动发现并命名新事件类型,可以有效获取多个领域的事件知识。现有研究将所有样本视为单一事件样本,仅考虑样本包含的某个事件类型,忽略了多事件样本对事件语义学习和事件类型命名的负面影响。针对上述问题,提出了一种结合对比学习和迭代优化的事件类型归纳方法。针对多事件样本对事件语义学习的影响,提出了一种基于提示学习的多事件检测方法,在模型训练前检测并剔除多事件样本。为了优化事件语义表示,提出了一种基于抽象语义表示(abstract meaning representation,AMR)的候选触发词识别策略,并引入外部锚点和聚类伪标签,优化对比学习训练效果。为了提升未知事件类型的命名质量,提出了一种基于ChatGPT反馈的事件类型命名迭代优化方法,根据ChatGPT的命名结果,剔除影响事件类型命名的样本,并使用经过处理的数据集微调模型。迭代上述过程,直到生成预期质量的事件类型名称。在ACE2005数据集上的实验结果表明,该方法能够显著提升未知事件类型的聚类效果,并能够有效生成高质量的事件类型名称。展开更多
基金Supported by the National Natural Science Foundation of China(Nos. 60975055 and 60803093)the National High-Tech Research and Development (863) Program of China (No.2008AA01Z144)
文摘Event extraction is an important research point in information extraction, which includes two important sub-tasks of event type recognition and event argument recognition. This paper describes a method based on automatic expansion of the event triggers for event type recognition. The event triggers are first extended through a thesaurus to enable the extraction of the candidate events and their candidate types. Then, a binary classification method is used to recognize the candidate event types. This method effectively improves the unbalanced data problem in training models and the data sparseness problem with a small corpus. Evaluations on the ACE2005 dataset give a final F-score of 61.24%, which outperforms traditional methods based on pure machine learning.
基金supported by the National Natural Science Foundation of China(62076031)。
文摘As a subtask of open domain event extraction(ODEE),new event type induction aims to discover a set of unseen event types from a given corpus.Existing methods mostly adopt semi-supervised or unsupervised learning to achieve the goal,which uses complex and different objective functions for labeled and unlabeled data respectively.In order to unify and simplify objective functions,a reliable pseudo-labeling prediction(RPP)framework for new event type induction was proposed.The framework introduces a double label reassignment(DLR)strategy for unlabeled data based on swap-prediction.DLR strategy can alleviate the model degeneration caused by swap-predication and further combine the real distribution over unseen event types to produce more reliable pseudo labels for unlabeled data.The generated reliable pseudo labels help the overall model be optimized by a unified and simple objective.Experiments show that RPP framework outperforms the state-of-the-art on the benchmark.
文摘针对现有的类案检索(LCR)方法缺乏对案情要素的有效利用而容易被案例内容的语义结构相似性误导的问题,提出一种融合时序行为链与事件类型的类案检索方法。首先,采取序列标注的方法识别案情描述中的法律事件类型,并利用案例文本中的行为要素构建时序行为链,以突出案情的关键要素,从而使模型聚焦于案例的核心内容,进而解决现有方法易被案例内容的语义结构相似性误导的问题;其次,利用分段编码构造时序行为链的相似性向量表征矩阵,从而增强案例间行为要素的语义交互;最后,通过聚合评分器,从时序行为链、法律事件类型、犯罪类型这3个角度衡量案例的相关性,从而增加案例匹配得分的合理性。实验结果表明,相较于SAILER(Structure-Aware pre-traIned language model for LEgal case Retrieval)方法,所提方法在LeCaRD(Legal Case Retrieval Dataset)上的P@5值提升了4个百分点、P@10值提升了3个百分点、MAP值提升了4个百分点,而NDCG@30值提升了0.8个百分点。可见,该方法能有效利用案情要素来避免案例内容的语义结构相似性的干扰,并能为类案检索提供可靠的依据。
文摘事件类型归纳能够从无标注文本中自动发现并命名新事件类型,可以有效获取多个领域的事件知识。现有研究将所有样本视为单一事件样本,仅考虑样本包含的某个事件类型,忽略了多事件样本对事件语义学习和事件类型命名的负面影响。针对上述问题,提出了一种结合对比学习和迭代优化的事件类型归纳方法。针对多事件样本对事件语义学习的影响,提出了一种基于提示学习的多事件检测方法,在模型训练前检测并剔除多事件样本。为了优化事件语义表示,提出了一种基于抽象语义表示(abstract meaning representation,AMR)的候选触发词识别策略,并引入外部锚点和聚类伪标签,优化对比学习训练效果。为了提升未知事件类型的命名质量,提出了一种基于ChatGPT反馈的事件类型命名迭代优化方法,根据ChatGPT的命名结果,剔除影响事件类型命名的样本,并使用经过处理的数据集微调模型。迭代上述过程,直到生成预期质量的事件类型名称。在ACE2005数据集上的实验结果表明,该方法能够显著提升未知事件类型的聚类效果,并能够有效生成高质量的事件类型名称。