摘要
为了更好地解决汉语标点句句首话题缺失的问题,需要在话题句识别过程中优化用于评估候选话题句优劣的评估函数.为此,提出了话题句生成的上下文相似性特征、话题串与评述相邻接的局部相似性特征,并设计了相关的评估函数.实验结果表明:综合运用这2个评估函数,话题句识别的准确率提高了5.72个百分点.
Topics were often omitted in the beginning of Chinese punctuation clause (abbreviated as PC). In order to better recover topics more accurately, an improved candidate topic clause (abbreviated as CTC) evaluation function was proposed in the topic clause (abbreviated as TC) identification task. Both the context similarity and the local similarity of CTC were taken into account in the evaluation function. Result shows that the performance of TC identification measured by accuracy is increased by 5.72 percent.
出处
《北京工业大学学报》
CAS
CSCD
北大核心
2014年第1期43-48,共6页
Journal of Beijing University of Technology
基金
国家自然科学基金资助项目(61171129)
北京市属高等学校创新团队提升计划资助项目(IDHT20130519)
关键词
广义话题
话题句
相似度
上下文相似性
局部相似性
generalized topic
topic clause
similarity
context similarity
local similarity