随着科学智能(artificial intelligence for science,AI4S)的兴起,属性图优化已逐渐成为连接图机器学习与生物医药、新材料等战略新兴领域的关键纽带,并展现出广阔的应用场景。针对传统优化方法在领域知识融合方面的不足,以及由此引发...随着科学智能(artificial intelligence for science,AI4S)的兴起,属性图优化已逐渐成为连接图机器学习与生物医药、新材料等战略新兴领域的关键纽带,并展现出广阔的应用场景。针对传统优化方法在领域知识融合方面的不足,以及由此引发的建模与实际场景脱节、黑盒评估效率低下、优化过程可控性不足等重要挑战,本研究对面向科学智能的属性图优化相关前沿技术进行了系统综述。本研究首先深入剖析属性图的建模与表示方法,探讨如何通过更精准的图表示机制提升模型对具体科学任务的适配性;进而,解析黑盒优化与深度代理模型的基本原理,探讨如何实现对黑盒评估过程的高效近似,提升模型整体的效率与精度;最后,重点探讨大语言模型(large language models,LLMs)在领域知识注入和决策辅助方面的作用机制,以增强优化过程的可解释性与可控性。开展面向科学智能的属性图优化研究,不仅有助于推动计算机科学与各学科的深度交叉融合,更能加速图机器学习技术在解决多领域实际科学问题的落地进程,创造更大的经济与社会效益。展开更多
图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用...图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用场景的日趋复杂,GNNs面临表达能力有限与泛化能力不足等关键挑战。近年来,以大语言模型(large language models,LLMs)为代表的基础模型迅速发展,展现出卓越的泛化与推理能力,为图机器学习领域带来了新的启发。基于此,本研究提出图基础模型(graph foundation model,GFM)的概念,希望通过在大规模图数据上预训练,获得能够灵活适配多种下游任务的通用模型;同时系统梳理了近年来图基础模型的相关研究,并依据其对GNNs与LLMs的依赖程度,将现有方法归纳为3类,综述其研究进展并介绍了作者团队在相关方向的实践探索经验。最后,展望了图基础模型未来发展可能面临的关键挑战与前景,以期为图机器学习领域的持续创新提供参考。展开更多
Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step...Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step-by-step thinking,leading to improved performance.However,despite significant advancements in LLMs,current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters.Additionally,the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations,with performance contingent on the quality of these annotations and varying with task-specific requirements.To address these limitations,we propose a select-and-answer prompting method(SAP)to enhance language model performance on reasoning tasks without the need for manual demonstrations.This method comprises two primary steps:guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting;allowing the model to provide final answers derived from these candidate answers.The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets.On ChatGLM-6B,SAP consistently outperforms few-shot CoT across all datasets.For GPT-3.5,SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases.These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks.展开更多
文摘随着科学智能(artificial intelligence for science,AI4S)的兴起,属性图优化已逐渐成为连接图机器学习与生物医药、新材料等战略新兴领域的关键纽带,并展现出广阔的应用场景。针对传统优化方法在领域知识融合方面的不足,以及由此引发的建模与实际场景脱节、黑盒评估效率低下、优化过程可控性不足等重要挑战,本研究对面向科学智能的属性图优化相关前沿技术进行了系统综述。本研究首先深入剖析属性图的建模与表示方法,探讨如何通过更精准的图表示机制提升模型对具体科学任务的适配性;进而,解析黑盒优化与深度代理模型的基本原理,探讨如何实现对黑盒评估过程的高效近似,提升模型整体的效率与精度;最后,重点探讨大语言模型(large language models,LLMs)在领域知识注入和决策辅助方面的作用机制,以增强优化过程的可解释性与可控性。开展面向科学智能的属性图优化研究,不仅有助于推动计算机科学与各学科的深度交叉融合,更能加速图机器学习技术在解决多领域实际科学问题的落地进程,创造更大的经济与社会效益。
文摘图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用场景的日趋复杂,GNNs面临表达能力有限与泛化能力不足等关键挑战。近年来,以大语言模型(large language models,LLMs)为代表的基础模型迅速发展,展现出卓越的泛化与推理能力,为图机器学习领域带来了新的启发。基于此,本研究提出图基础模型(graph foundation model,GFM)的概念,希望通过在大规模图数据上预训练,获得能够灵活适配多种下游任务的通用模型;同时系统梳理了近年来图基础模型的相关研究,并依据其对GNNs与LLMs的依赖程度,将现有方法归纳为3类,综述其研究进展并介绍了作者团队在相关方向的实践探索经验。最后,展望了图基础模型未来发展可能面临的关键挑战与前景,以期为图机器学习领域的持续创新提供参考。
基金National Natural Science Foundation of China(No.62176052)。
文摘Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step-by-step thinking,leading to improved performance.However,despite significant advancements in LLMs,current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters.Additionally,the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations,with performance contingent on the quality of these annotations and varying with task-specific requirements.To address these limitations,we propose a select-and-answer prompting method(SAP)to enhance language model performance on reasoning tasks without the need for manual demonstrations.This method comprises two primary steps:guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting;allowing the model to provide final answers derived from these candidate answers.The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets.On ChatGLM-6B,SAP consistently outperforms few-shot CoT across all datasets.For GPT-3.5,SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases.These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks.