摘要
针对细分领域知识关联挖掘应用中普遍存在的“所得结果不需要挖掘也知道”这一质疑,提出一种更符合专业人员需求的非常识性知识关联挖掘方案。该方案包含三个关键点:①数据源采用专业人员经验交流文本,而不是常识性的百科文本,以保障挖掘结果符合专业问题解决的需要;②采用大规模预训练词向量+小规模细分领域语料学习微调的方式,能更好地开展领域术语表示学习,以解决细分领域语料不足和未登录专业术语学习的效果问题;③依托领域知识库剔除挖掘结果中常识性知识关联,以向专业人员提供值得深入的潜在性、线索性知识关联。以心血管领域为例,从小规模医生经验交流文本上挖掘所得知识关联,能更好地契合临床疑难问题解决经验、医学研究实验发现,可为专业人员提供有价值的、可进一步知识探索和利用的线索。
In order to solve the problem that the results of knowledge association mining do not need to be mined in subdivision domain,this paper proposes a new scheme of non-common knowledge association mining which is more suitable for the needs of professionals.The scheme consists of three key points:firstly,to ensure the mining results can better solve professional problems,professional experience-sharing texts are used to analyze data,not common-sense encyclopedic texts.Secondly,to solve the problem of insufficient corpus in subdivided field and the unlisted terminology,the method of"large-scale pre-training word vector and small-scale subdivided field corpus learning fine-tuning"is adopted and carry out better representation learning in domain terminology.Finally,potential and cued knowledge association is provided to professionals after eliminating common knowledge association from mining results.Taking cardiovascular field as an example,knowledge association that mine from experience exchange texts of small-scale can better fit difficult clinical problem solving experience,medical research experiment discovery,and provide valuable clues for professionals to further explore and utilize knowledge.
作者
肖璐
赵之辉
陈果
Xiao Lu;Zhao Zhihui;Chen Guo(School of Journalism,Nanjing University of Finance&Economics,Nanjing,210023;School of Economics&Management,Nanjing University of Science and Technology,Nanjing,210094)
出处
《信息资源管理学报》
CSSCI
2020年第6期101-109,134,共10页
Journal of Information Resources Management
基金
国家社会科学基金青年项目“学术型网络社区多元关联挖掘与知识聚合研究”(16CTQ025)研究成果之一。
关键词
知识关联挖掘
领域知识分析
预训练
表示学习
小规模语料
Knowledge association mining
Domain knowledge analysis
Pre-training
Representation learning
Small-scale corpus