In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages o...In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.展开更多
To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of ...To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining.展开更多
通用大语言模型(large language model,LLM)在数学古籍领域常因专业术语识别不准确、上下文关系理解偏差、知识推理不全和计算过程错误而产生严重的幻觉(hallucination)问题,包括事实性幻觉、忠实性幻觉和逻辑性幻觉。检索增强生成(retr...通用大语言模型(large language model,LLM)在数学古籍领域常因专业术语识别不准确、上下文关系理解偏差、知识推理不全和计算过程错误而产生严重的幻觉(hallucination)问题,包括事实性幻觉、忠实性幻觉和逻辑性幻觉。检索增强生成(retrieval-augmented generation,RAG)技术引入外部领域文本,目前是缓解大语言模型领域事实性幻觉的一种有效方法。然而RAG的检索内容存在噪声,且知识片段之间缺乏深层关联,使得RAG在缓解忠实性幻觉和逻辑性幻觉方面的能力较为有限。领域知识图谱(domain knowledge graph,DKG)具有将领域知识关联起来的能力,为此提出了DKG和RAG协同知识增强框架(collaborative knowledge augmentation between domain knowledge graph and retrieval-augmented generation,CogKAG),并构建数学古籍《九章算术》领域智能体。CogKAG框架智能体旨在通过检索DKG的结构化领域关联知识和利用RAG检索非结构化领域文本构建动态结构化综合上下文提示,增强LLM的上下文关系及逻辑推理计算能力,从而有效缓解其在数学古籍领域中的忠实性、逻辑性幻觉问题。实验结果证明,CogKAG框架智能体可以显著减轻LLM在数学古籍领域上的幻觉,从而提升在问答(question and answer,QA)任务上的性能。展开更多
Visualization methods for single documents are either too simple, considering word frequency only, or depend on syntactic and semantic information bases to be more useful. This paper presents an intermediary approach,...Visualization methods for single documents are either too simple, considering word frequency only, or depend on syntactic and semantic information bases to be more useful. This paper presents an intermediary approach, based on H. P. Luhn’s automatic abstract creation algorithm, and intends to aggregate more information to document visualization than word counting methods do without the need of external sources. The method takes pairs of relevant words and computes the linkage force between them. Relevant words become vertices and links become edges in the resulting graph.展开更多
基金Supported by the National Natural Science Foundation of China(61202193,61202304)the Major Projects of Chinese National Social Science Foundation(11&ZD189)the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)
文摘In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.
文摘To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining.
文摘通用大语言模型(large language model,LLM)在数学古籍领域常因专业术语识别不准确、上下文关系理解偏差、知识推理不全和计算过程错误而产生严重的幻觉(hallucination)问题,包括事实性幻觉、忠实性幻觉和逻辑性幻觉。检索增强生成(retrieval-augmented generation,RAG)技术引入外部领域文本,目前是缓解大语言模型领域事实性幻觉的一种有效方法。然而RAG的检索内容存在噪声,且知识片段之间缺乏深层关联,使得RAG在缓解忠实性幻觉和逻辑性幻觉方面的能力较为有限。领域知识图谱(domain knowledge graph,DKG)具有将领域知识关联起来的能力,为此提出了DKG和RAG协同知识增强框架(collaborative knowledge augmentation between domain knowledge graph and retrieval-augmented generation,CogKAG),并构建数学古籍《九章算术》领域智能体。CogKAG框架智能体旨在通过检索DKG的结构化领域关联知识和利用RAG检索非结构化领域文本构建动态结构化综合上下文提示,增强LLM的上下文关系及逻辑推理计算能力,从而有效缓解其在数学古籍领域中的忠实性、逻辑性幻觉问题。实验结果证明,CogKAG框架智能体可以显著减轻LLM在数学古籍领域上的幻觉,从而提升在问答(question and answer,QA)任务上的性能。
文摘Visualization methods for single documents are either too simple, considering word frequency only, or depend on syntactic and semantic information bases to be more useful. This paper presents an intermediary approach, based on H. P. Luhn’s automatic abstract creation algorithm, and intends to aggregate more information to document visualization than word counting methods do without the need of external sources. The method takes pairs of relevant words and computes the linkage force between them. Relevant words become vertices and links become edges in the resulting graph.