摘要
本文统计Klue⁃ner和Kochet⁃ner两个命名实体语料库中的新闻、评论和文化遗产文本数据包含的不同类别朝鲜语命名实体。根据统计结果分析朝鲜语命名实体的音节长度特征分布和格词尾结合率。分析结果表明,音节长度和格词尾的使用在命名实体分类中具有一定的规律可循。本文的研究成果可用于命名实体分类工作,同时也可以为朝鲜语命名实体语料库构建提供分布结构建议。
This paper counts the different categories of Korean named entities contained in the news,comments,and cultural heritage text data in the Klue⁃ner and Kochet⁃ner named entity corpora.According to the statistical results,the syllable length feature distribution and case particle combination rate of Korean named entities are analyzed.The results show that the use of syllable length and case particles has certain regularity in named entity classification,which can be used for named entity classification work and can also provide distribution structure suggestions for the construction of Korean named entity corpora.
作者
黄政豪
金光洙
Huang Zheng-hao;Jin Guang-zhu(Engineering College,Yanbian University,Yanji 133002,China;School of Foreign Languages,Yanbian University,Yanji 133002,China)
出处
《外语学刊》
北大核心
2025年第1期9-18,共10页
Foreign Language Research
基金
国家社科基金重大招标项目“朝鲜汉字资源文献整理与研究”(18ZDA306)
延边大学外国语言文学世界一流学科建设攻关科研项目(18YLGG01)的阶段性成果。
关键词
朝鲜语
格词尾
命名实体识别
特征提取
名词分类
Korean
case particle
named entity recognition
feature extraction
noun classification