摘要
计算机对语言调查表记音文本语料的音标校对、音系整理、编制同音字汇表和音节词素表,关键是从音标字符串中准确地切分音节、声母、韵母、声调。正向扫描最小数字字符匹配,可用于切分出音节字符串和声调数字串;正向扫描最小元音字符匹配,可用于切分声母和韵母。在此基础上,从字表和词表中随机提取例字例词,可快速生成音系表、同音字汇、音节词素表,从而大大提高语言田野调查的语料整理效率。
The software can process IPA-transcribed text data, such as correcting IPA, extracting phonemic units, creating a homophonic syllables and character list. The key technique is to exactly segment an IPA string into the syllable, the initial, the final and the tone. This paper gives a computer algorithm the method of end-number character matching for segmenting a syllable, and the method of left-to-right minimal character matching for segmenting the initial and final. Based on the exact segmentations, the software can generate phonological unit lists, homophonic unit lists, syllabic-morpheme lists by extracting examples from wordlist or character list. This software can greatly enhance the efficiency of corpus collection fieldwork.
出处
《语言文字应用》
CSSCI
北大核心
2012年第2期137-143,共7页
Applied Linguistics
基金
暨南大学团队创新项目"濒危语言有声语档建设理论与实践研究"
国家语委2011年度十二五科研规划项目"中国濒危语言有声资源采集
传输和集成技术研究"(YB135-11)
关键词
音标切分
语言调查软件
音系表
同音字汇
IPA string segment
language fieldwork software
phonological unit list
homophonic unit list