Based on the redefinition of Chinese loan words and corresponding types of Chinese loan words, this article makes an initiative hypothesis that nowadays the ideographic trend of Chinese loan words is being hastened to...Based on the redefinition of Chinese loan words and corresponding types of Chinese loan words, this article makes an initiative hypothesis that nowadays the ideographic trend of Chinese loan words is being hastened to a greater degree than before.Depending on the Prototype Models Theory for the types of Chinese loan words, a comprehensive analysis of this trend is made in four aspects:the transfer from transliteration loans to loan translation;the ideographic trend of transliteration loans;the full ideograph of shift loan words;word-for-word translation of loan words.展开更多
Ancient Chinese characters, typically the ideographic characters on bones and bronze before Shang Dynasty(16th—11th century B.C.), are valuable culture legacy of history. However the recognition of Ancient Chinese ch...Ancient Chinese characters, typically the ideographic characters on bones and bronze before Shang Dynasty(16th—11th century B.C.), are valuable culture legacy of history. However the recognition of Ancient Chinese characters has been the task of paleography experts for long. With the help of modern computer technique, everyone can expect to be able to recognize the characters and understand the ancient inscriptions. This research is aimed to help people recognize and understand those ancient Chinese characters by combining Chinese paleography theory and computer information processing technology. Based on the analysis of ancient character features, a method for structural character recognition is proposed. The important characteristics of strokes and basic components or radicals used in recognition are introduced in detail. A system was implemented based on above method to show the effectiveness of the method.展开更多
目的零样本汉字识别(zero-shot Chinese character recognition,ZSCCR)因其能在零或少训练样本下识别未见汉字而受到广泛关注。现有的零样本汉字识别方法大多采用基于部首序列匹配框架,即首先预测部首序列,然后根据表意描述序列(ideogra...目的零样本汉字识别(zero-shot Chinese character recognition,ZSCCR)因其能在零或少训练样本下识别未见汉字而受到广泛关注。现有的零样本汉字识别方法大多采用基于部首序列匹配框架,即首先预测部首序列,然后根据表意描述序列(ideographic description sequence,IDS)字典进行最小编辑距离(minimum edit distance,MED)匹配。然而,现有的MED算法默认不同部首的替换代价、插入代价和删除代价相同,导致在匹配时候选字符类别存在距离代价模糊和冗余的问题。为此,提出了一种字符敏感编辑距离(character-aware edit distance,CAED)以正确匹配目标字符类别。方法通过设计多种部首信息提取方法,获得了更为精细化的部首描述,从而得到更精确的部首替换代价,提高了MED的鲁棒性和有效性;此外,提出部首计数模块预测样本的部首数量,从而形成代价门控以约束和调整插入和删除代价,克服了IDS序列长度预测不准确产生的影响。结果在手写汉字、场景汉字和古籍汉字等数据集上进行实验验证,与以往的方法相比,本文提出的CAED在识别未见汉字类别的准确率上分别提高了4.64%、1.1%和5.08%,同时对已见汉字类别保持相当的性能,实验结果充分表明了本方法的有效性。结论本文所提出的字符敏感编辑距离,使得替换、插入和删除3种编辑代价根据字符进行自适应调整,有效提升了对未见汉字的识别性能。展开更多
文摘Based on the redefinition of Chinese loan words and corresponding types of Chinese loan words, this article makes an initiative hypothesis that nowadays the ideographic trend of Chinese loan words is being hastened to a greater degree than before.Depending on the Prototype Models Theory for the types of Chinese loan words, a comprehensive analysis of this trend is made in four aspects:the transfer from transliteration loans to loan translation;the ideographic trend of transliteration loans;the full ideograph of shift loan words;word-for-word translation of loan words.
基金Supported by Seminar of National Social Funds Project(12&ZD234)
文摘Ancient Chinese characters, typically the ideographic characters on bones and bronze before Shang Dynasty(16th—11th century B.C.), are valuable culture legacy of history. However the recognition of Ancient Chinese characters has been the task of paleography experts for long. With the help of modern computer technique, everyone can expect to be able to recognize the characters and understand the ancient inscriptions. This research is aimed to help people recognize and understand those ancient Chinese characters by combining Chinese paleography theory and computer information processing technology. Based on the analysis of ancient character features, a method for structural character recognition is proposed. The important characteristics of strokes and basic components or radicals used in recognition are introduced in detail. A system was implemented based on above method to show the effectiveness of the method.
文摘目的零样本汉字识别(zero-shot Chinese character recognition,ZSCCR)因其能在零或少训练样本下识别未见汉字而受到广泛关注。现有的零样本汉字识别方法大多采用基于部首序列匹配框架,即首先预测部首序列,然后根据表意描述序列(ideographic description sequence,IDS)字典进行最小编辑距离(minimum edit distance,MED)匹配。然而,现有的MED算法默认不同部首的替换代价、插入代价和删除代价相同,导致在匹配时候选字符类别存在距离代价模糊和冗余的问题。为此,提出了一种字符敏感编辑距离(character-aware edit distance,CAED)以正确匹配目标字符类别。方法通过设计多种部首信息提取方法,获得了更为精细化的部首描述,从而得到更精确的部首替换代价,提高了MED的鲁棒性和有效性;此外,提出部首计数模块预测样本的部首数量,从而形成代价门控以约束和调整插入和删除代价,克服了IDS序列长度预测不准确产生的影响。结果在手写汉字、场景汉字和古籍汉字等数据集上进行实验验证,与以往的方法相比,本文提出的CAED在识别未见汉字类别的准确率上分别提高了4.64%、1.1%和5.08%,同时对已见汉字类别保持相当的性能,实验结果充分表明了本方法的有效性。结论本文所提出的字符敏感编辑距离,使得替换、插入和删除3种编辑代价根据字符进行自适应调整,有效提升了对未见汉字的识别性能。