期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
OpenHowNet词内结构标注
1
作者 董家暄 辛欣 生浩然 《中文信息学报》 北大核心 2025年第10期54-65,共12页
中文词内部有类似句法的成分和依存两种结构,词内结构体现了字与字之间的句法语义关系。该文以OpenHowNet为基础,结合已有工作,通过人工方法对知识库内的完整词汇进行词内成分结构标注,共计126036条词汇。其中,该工作共标注了77302条(... 中文词内部有类似句法的成分和依存两种结构,词内结构体现了字与字之间的句法语义关系。该文以OpenHowNet为基础,结合已有工作,通过人工方法对知识库内的完整词汇进行词内成分结构标注,共计126036条词汇。其中,该工作共标注了77302条(结构不一样的多义词有152条),从已有工作中抽取标注了27944条,不需要标注的单字为20790条。在成分标注的基础上进行词内依存结构预测,其LAS达到95.47%。该文从句法分析的角度丰富了OpenHowNet知识库的语料信息。有了OpenHowNet的词内句法信息,真实场景中的词内结构预测问题可由原有的直接预测改进为知识库查询加预测的方法,对被OpenHowNet命中的词将直接给出其结构;对于非命中词再进行预测。基于查询加预测的模式,该文在CTB5数据集上进行成分结构预测,F 1值为92.40%,相比直接预测的F 1值提高了约4个百分点;在WIST数据集上进行依存结构预测,LAS为90.60%,相比基线模型提高了约7个百分点。 展开更多
关键词 语料标注 词内结构 openhownet
在线阅读 下载PDF
Typos Correction in Overseas Chinese Learning Based on Chinese Character Semantic Knowledge Graph
2
作者 Jing Xiong Xue Zhai +1 位作者 Zhan Zhang Feng Gao 《Journal of Data Analysis and Information Processing》 2023年第2期200-216,共17页
In recent years, more and more foreigners begin to learn Chinese characters, but they often make typos when using Chinese. The fundamental reason is that they mainly learn Chinese characters from the glyph and pronunc... In recent years, more and more foreigners begin to learn Chinese characters, but they often make typos when using Chinese. The fundamental reason is that they mainly learn Chinese characters from the glyph and pronunciation, but do not master the semantics of Chinese characters. If they can understand the meaning of Chinese characters and form knowledge groups of the characters with relevant meanings, it can effectively improve learning efficiency. We achieve this goal by building a Chinese character semantic knowledge graph (CCSKG). In the process of building the knowledge graph, the semantic computing capacity of HowNet was utilized, and 104,187 associated edges were finally established for 6752 Chinese characters. Thanks to the development of deep learning, OpenHowNet releases the core data of HowNet and provides useful APIs for calculating the similarity between two words based on sememes. Therefore our method combines the advantages of data-driven and knowledge-driven. The proposed method treats Chinese sentences as subgraphs of the CCSKG and uses graph algorithms to correct Chinese typos and achieve good results. The experimental results show that compared with keras-bert and pycorrector + ernie, our method reduces the false acceptance rate by 38.28% and improves the recall rate by 40.91% in the field of learning Chinese as a foreign language. The CCSKG can help to promote Chinese overseas communication and international education. 展开更多
关键词 Chinese Character Meaning Knowledge Graph Typos Correction openhownet Semantic Relevancy
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部