随着大语言模型(large language models,LLMs)日益融入人类生活,成为人机交互中的关键对象,如何准确评测其价值观已经成为重要的研究课题。这不仅能够衡量其安全性,保障其在交互场景中负责任地发展,还能够帮助用户发现与其个人价值观更...随着大语言模型(large language models,LLMs)日益融入人类生活,成为人机交互中的关键对象,如何准确评测其价值观已经成为重要的研究课题。这不仅能够衡量其安全性,保障其在交互场景中负责任地发展,还能够帮助用户发现与其个人价值观更契合的模型,同时可以为实现人机交互中模型与人类价值观对齐提供关键指导信号。然而,价值观评测面临三大复杂挑战:如何定义合适的评测目标,以准确地揭示交互中复杂、多元的人类价值观;如何确保评测的有效性,现有的静态开源数据集存在数据污染风险且原本有效的测试样本随着大模型的快速演进容易失效。此外,许多现有工作只衡量模型对价值观的知识掌握,而非其在实际人机交互场景中的价值观践行能力,这导致评测结果难以真实反映用户对模型能力的需求;如何科学地度量评测结果,价值观评测通常是多维度的,评测时需要在不同价值维度间进行加权整合,并考虑不同的价值优先级。为了应对以上挑战,我们团队研究并搭建了价值观罗盘评估中心(Value Compass Benchmarks),通过3个创新模块实现科学的价值观评测。首先,基于社会科学中的人类基本价值观来定义评测目标,通过有限的核心价值维度全面揭示价值观;其次,设计了生成式动态演进评测框架,通过动态问题生成器实时生成评测样本,并采用生成式评测方法,分析模型在真实情境中的价值观体现;最后,提出一-种评测指标,通过加权整合各维度的价值观,支持个性化定制权重。我们期望该平台能够提供科学、系统的价值观评测服务,同时促进模型价值观对齐研究的发展。展开更多
Offline policy evaluation,evaluating and selecting complex policies for decision-making by only using offline datasets is important in reinforcement learning.At present,the model-based offline policy evaluation(MBOPE)...Offline policy evaluation,evaluating and selecting complex policies for decision-making by only using offline datasets is important in reinforcement learning.At present,the model-based offline policy evaluation(MBOPE)is widely welcomed because of its easy to implement and good performance.MBOPE directly approximates the unknown value of a given policy using the Monte Carlo method given the estimated transition and reward functions of the environment.Usually,multiple models are trained,and then one of them is selected to be used.However,a challenge remains in selecting an appropriate model from those trained for further use.The authors first analyse the upper bound of the difference between the approximated value and the unknown true value.Theoretical results show that this difference is related to the trajectories generated by the given policy on the learnt model and the prediction error of the transition and reward functions at these generated data points.Based on the theoretical results,a new criterion is proposed to tell which trained model is better suited for evaluating the given policy.At last,the effectiveness of the proposed criterion is demonstrated on both benchmark and synthetic offline datasets.展开更多
Aqueous Zn batteries are promising candidates for grid-scale renewable energy storage.Foil electrodes have been widely investigated and applied as anode materials for aqueous Zn batteries,however,they suffer from limi...Aqueous Zn batteries are promising candidates for grid-scale renewable energy storage.Foil electrodes have been widely investigated and applied as anode materials for aqueous Zn batteries,however,they suffer from limited surface area and severe interfacial issues including metallic dendrites and corrosion side reactions,limiting the depth of discharge(DOD)of the foil electrode materials.Herein,a low-temperature replacement reaction is utilized to in-situ construct a three-dimensional(3D)corrosion-resistant interface for deeply rechargeable Zn foil electrodes.Specifically,the deliberate low-temperature environment controlled the replacement rate between polycrystalline Zn metal and oxalic acid,producing a Zn foil electrode with distinct 3D corrosion-resistant interface(3DCI-Zn),which differed from conventional two-dimensional(2D)protective structure and showed an order of magnitude higher surface area.Consequently,the 3DCI-Zn electrode exhibited dendrite-free and anticorrosion properties,and achieved stable plating/stripping performance for 1000 h at 10 mA cm^(-2)and 10 mAh cm^(-2)with a remarkable DOD of 79%.After pairing with a MnO2cathode with a high areal capacity of 4.2 mAh cm^(-2),the pouch cells delivered 168 Wh L^(-1)and a capacity retention of 89.7%after 100 cycles with a low negative/positive(N/P)ratio of 3:1.展开更多
众多自然语言处理(Natural Language Processing,NLP)任务受益于在大规模语料上训练的词向量。由于预训练的词向量具有大语料上的通用语义特征,因此将这些词向量应用到特定的下游任务时,往往需要通过微调进行一定的更新和调整,使其更适...众多自然语言处理(Natural Language Processing,NLP)任务受益于在大规模语料上训练的词向量。由于预训练的词向量具有大语料上的通用语义特征,因此将这些词向量应用到特定的下游任务时,往往需要通过微调进行一定的更新和调整,使其更适用于目标任务。但是,目标语料集中的低频词由于缺少训练样本,导致在微调过程中无法获得稳定的梯度信息,使得词向量无法得到有效更新。而在短文本分类任务中,这些低频词对分类结果同样有着重要的指示性。因此,在具体的短文本分类任务上获得一个更好的低频词词向量表示是有必要的。针对这个问题,文中提出了一种与下游任务模型无关的低频词词向量更新算法,通过基于K近邻的词向量偏移计算方法,利用通用词向量中与低频词相似的高频词所获得的任务特征信息,来指导低频词的信息更新,从而获得更准确的且适用于当前任务语境的低频词词向量表示;并以TextCNN作为基准模型,基于word2vec和GloVe得到的两个通用预训练词向量,在3个公开的短文本数据集上进行了优化算法的效果验证。实验结果表明,使用优化算法更新低频词词表示后,模型分类准确率能达到84.3%~94%,较更新前提升了0.4%~1.4%,体现了优化算法的有效性,也进一步证明了短文本分类任务中低频词对分类结果的影响,为短文本分类的研究工作提供了一定的借鉴。展开更多
深度学习应用的训练过程是计算密集型的,它通常依靠图形处理单元(Graphics Processing Unit, GPU)来加速训练过程。然而深度学习开发框架往往会独占GPU,造成计算资源的浪费。针对该问题,该实证研究对两个深度学习应用共享GPU训练的可行...深度学习应用的训练过程是计算密集型的,它通常依靠图形处理单元(Graphics Processing Unit, GPU)来加速训练过程。然而深度学习开发框架往往会独占GPU,造成计算资源的浪费。针对该问题,该实证研究对两个深度学习应用共享GPU训练的可行性进行讨论,系统地分析了有代表性的深度学习模型的静态和运行时特性,展示了共享GPU训练两个模型时,不同的模型组合和特征对整体性能的影响。根据实验结果所总结的原则可以作为提高调度效率和改善GPU云资源利用率的指导方针。展开更多
探讨了以大语言模型(large language model,LLM)为代表的大模型(big model)时代人工智能(artificial intelligence,AI)发展面临的新挑战:道德价值观对齐问题.大模型的崛起极大地提升了AI理解、生成和控制信息与内容的能力,从而赋能了丰...探讨了以大语言模型(large language model,LLM)为代表的大模型(big model)时代人工智能(artificial intelligence,AI)发展面临的新挑战:道德价值观对齐问题.大模型的崛起极大地提升了AI理解、生成和控制信息与内容的能力,从而赋能了丰富的下游应用.然而,随着大模型成为与人类生活方方面面深度交融的基础,其内在的道德价值观和潜在的价值倾向对人类社会带来不可预测的风险.首先对大模型面临的风险和挑战进行了梳理,介绍了当下主流的AI伦理准则和大模型的局限性对应的道德问题.随后提出从规范伦理学的角度重新审视近年来不断提出的各类规范性准则,并倡导学界共同协作构建统一的普适性AI道德框架.为进一步探究大模型的道德倾向,基于道德基础理论体系,检验了当下主流大语言模型的道德价值倾向,梳理了现有的大模型对齐算法,总结了大模型在道德价值观对齐上所面临的独特挑战.为解决这些挑战,提出了一种新的针对大模型道德价值观对齐的概念范式,从对齐维度、对齐评测和对齐方法3个方面展望了有潜力的研究方向.最后,倡导以交叉学科为基础,为将来构建符合人类道德观的通用AI迈出了重要一步.展开更多
Natural language processing(NLP)is a subfield of artificial intelligence that focuses on enabling computers to understand and process human languages.In the last five years,we have witnessed the rapid development of N...Natural language processing(NLP)is a subfield of artificial intelligence that focuses on enabling computers to understand and process human languages.In the last five years,we have witnessed the rapid development of NLP in tasks such as machine translation,question-answering,and machine reading comprehension based on deep learning and an enormous volume of annotated and unannotated data.In this paper,we will review the latest progress in the neural network-based NLP framework(neural NLP)from three perspectives:modeling,learning,and reasoning.In the modeling section,we will describe several fundamental neural network-based modeling paradigms,such as word embedding,sentence embedding,and sequence-to-sequence modeling,which are widely used in modern NLP engines.In the learning section,we will introduce widely used learning methods for NLP models,including supervised,semi-supervised,and unsupervised learning;multitask learning;transfer learning;and active learning.We view reasoning as a new and exciting direction for neural NLP,but it has yet to be well addressed.In the reasoning section,we will review reasoning mechanisms,including the knowledge,existing non-neural inference methods,and new neural inference methods.We emphasize the importance of reasoning in this paper because it is important for building interpretable and knowledgedriven neural NLP models to handle complex tasks.At the end of this paper,we will briefly outline our thoughts on the future directions of neural NLP.展开更多
文摘随着大语言模型(large language models,LLMs)日益融入人类生活,成为人机交互中的关键对象,如何准确评测其价值观已经成为重要的研究课题。这不仅能够衡量其安全性,保障其在交互场景中负责任地发展,还能够帮助用户发现与其个人价值观更契合的模型,同时可以为实现人机交互中模型与人类价值观对齐提供关键指导信号。然而,价值观评测面临三大复杂挑战:如何定义合适的评测目标,以准确地揭示交互中复杂、多元的人类价值观;如何确保评测的有效性,现有的静态开源数据集存在数据污染风险且原本有效的测试样本随着大模型的快速演进容易失效。此外,许多现有工作只衡量模型对价值观的知识掌握,而非其在实际人机交互场景中的价值观践行能力,这导致评测结果难以真实反映用户对模型能力的需求;如何科学地度量评测结果,价值观评测通常是多维度的,评测时需要在不同价值维度间进行加权整合,并考虑不同的价值优先级。为了应对以上挑战,我们团队研究并搭建了价值观罗盘评估中心(Value Compass Benchmarks),通过3个创新模块实现科学的价值观评测。首先,基于社会科学中的人类基本价值观来定义评测目标,通过有限的核心价值维度全面揭示价值观;其次,设计了生成式动态演进评测框架,通过动态问题生成器实时生成评测样本,并采用生成式评测方法,分析模型在真实情境中的价值观体现;最后,提出一-种评测指标,通过加权整合各维度的价值观,支持个性化定制权重。我们期望该平台能够提供科学、系统的价值观评测服务,同时促进模型价值观对齐研究的发展。
文摘Offline policy evaluation,evaluating and selecting complex policies for decision-making by only using offline datasets is important in reinforcement learning.At present,the model-based offline policy evaluation(MBOPE)is widely welcomed because of its easy to implement and good performance.MBOPE directly approximates the unknown value of a given policy using the Monte Carlo method given the estimated transition and reward functions of the environment.Usually,multiple models are trained,and then one of them is selected to be used.However,a challenge remains in selecting an appropriate model from those trained for further use.The authors first analyse the upper bound of the difference between the approximated value and the unknown true value.Theoretical results show that this difference is related to the trajectories generated by the given policy on the learnt model and the prediction error of the transition and reward functions at these generated data points.Based on the theoretical results,a new criterion is proposed to tell which trained model is better suited for evaluating the given policy.At last,the effectiveness of the proposed criterion is demonstrated on both benchmark and synthetic offline datasets.
基金financially supported by the National Natural Science Foundation of China (No.22205068,22109144)the“CUG Scholar”Scientific Research Funds at China University of Geosciences (Wuhan) (Project No.2022118)the Fundamental Research Funds for the Central Universities,China University of Geosciences (Wuhan) (No.162301202673)。
文摘Aqueous Zn batteries are promising candidates for grid-scale renewable energy storage.Foil electrodes have been widely investigated and applied as anode materials for aqueous Zn batteries,however,they suffer from limited surface area and severe interfacial issues including metallic dendrites and corrosion side reactions,limiting the depth of discharge(DOD)of the foil electrode materials.Herein,a low-temperature replacement reaction is utilized to in-situ construct a three-dimensional(3D)corrosion-resistant interface for deeply rechargeable Zn foil electrodes.Specifically,the deliberate low-temperature environment controlled the replacement rate between polycrystalline Zn metal and oxalic acid,producing a Zn foil electrode with distinct 3D corrosion-resistant interface(3DCI-Zn),which differed from conventional two-dimensional(2D)protective structure and showed an order of magnitude higher surface area.Consequently,the 3DCI-Zn electrode exhibited dendrite-free and anticorrosion properties,and achieved stable plating/stripping performance for 1000 h at 10 mA cm^(-2)and 10 mAh cm^(-2)with a remarkable DOD of 79%.After pairing with a MnO2cathode with a high areal capacity of 4.2 mAh cm^(-2),the pouch cells delivered 168 Wh L^(-1)and a capacity retention of 89.7%after 100 cycles with a low negative/positive(N/P)ratio of 3:1.
基金国家自然科学基金(61502137)中央高校基本科研业务费(NJ2019010)+4 种基金南京大学计算机软件新技术国家重点实验室开放课题(KFKT2018B20)香港岭南大学香港商学研究所2019-20种子研究基金(190-009)香港岭南大学种子研究基金(102367)香港岭南大学陈斌博士数据科学机构项目(LEO Dr David P.Chan Institute of DataScience)南京航空航天大学研究生创新基地(实验室)开放基金(Kfjj20191601)。
文摘众多自然语言处理(Natural Language Processing,NLP)任务受益于在大规模语料上训练的词向量。由于预训练的词向量具有大语料上的通用语义特征,因此将这些词向量应用到特定的下游任务时,往往需要通过微调进行一定的更新和调整,使其更适用于目标任务。但是,目标语料集中的低频词由于缺少训练样本,导致在微调过程中无法获得稳定的梯度信息,使得词向量无法得到有效更新。而在短文本分类任务中,这些低频词对分类结果同样有着重要的指示性。因此,在具体的短文本分类任务上获得一个更好的低频词词向量表示是有必要的。针对这个问题,文中提出了一种与下游任务模型无关的低频词词向量更新算法,通过基于K近邻的词向量偏移计算方法,利用通用词向量中与低频词相似的高频词所获得的任务特征信息,来指导低频词的信息更新,从而获得更准确的且适用于当前任务语境的低频词词向量表示;并以TextCNN作为基准模型,基于word2vec和GloVe得到的两个通用预训练词向量,在3个公开的短文本数据集上进行了优化算法的效果验证。实验结果表明,使用优化算法更新低频词词表示后,模型分类准确率能达到84.3%~94%,较更新前提升了0.4%~1.4%,体现了优化算法的有效性,也进一步证明了短文本分类任务中低频词对分类结果的影响,为短文本分类的研究工作提供了一定的借鉴。
文摘深度学习应用的训练过程是计算密集型的,它通常依靠图形处理单元(Graphics Processing Unit, GPU)来加速训练过程。然而深度学习开发框架往往会独占GPU,造成计算资源的浪费。针对该问题,该实证研究对两个深度学习应用共享GPU训练的可行性进行讨论,系统地分析了有代表性的深度学习模型的静态和运行时特性,展示了共享GPU训练两个模型时,不同的模型组合和特征对整体性能的影响。根据实验结果所总结的原则可以作为提高调度效率和改善GPU云资源利用率的指导方针。
文摘探讨了以大语言模型(large language model,LLM)为代表的大模型(big model)时代人工智能(artificial intelligence,AI)发展面临的新挑战:道德价值观对齐问题.大模型的崛起极大地提升了AI理解、生成和控制信息与内容的能力,从而赋能了丰富的下游应用.然而,随着大模型成为与人类生活方方面面深度交融的基础,其内在的道德价值观和潜在的价值倾向对人类社会带来不可预测的风险.首先对大模型面临的风险和挑战进行了梳理,介绍了当下主流的AI伦理准则和大模型的局限性对应的道德问题.随后提出从规范伦理学的角度重新审视近年来不断提出的各类规范性准则,并倡导学界共同协作构建统一的普适性AI道德框架.为进一步探究大模型的道德倾向,基于道德基础理论体系,检验了当下主流大语言模型的道德价值倾向,梳理了现有的大模型对齐算法,总结了大模型在道德价值观对齐上所面临的独特挑战.为解决这些挑战,提出了一种新的针对大模型道德价值观对齐的概念范式,从对齐维度、对齐评测和对齐方法3个方面展望了有潜力的研究方向.最后,倡导以交叉学科为基础,为将来构建符合人类道德观的通用AI迈出了重要一步.
文摘Natural language processing(NLP)is a subfield of artificial intelligence that focuses on enabling computers to understand and process human languages.In the last five years,we have witnessed the rapid development of NLP in tasks such as machine translation,question-answering,and machine reading comprehension based on deep learning and an enormous volume of annotated and unannotated data.In this paper,we will review the latest progress in the neural network-based NLP framework(neural NLP)from three perspectives:modeling,learning,and reasoning.In the modeling section,we will describe several fundamental neural network-based modeling paradigms,such as word embedding,sentence embedding,and sequence-to-sequence modeling,which are widely used in modern NLP engines.In the learning section,we will introduce widely used learning methods for NLP models,including supervised,semi-supervised,and unsupervised learning;multitask learning;transfer learning;and active learning.We view reasoning as a new and exciting direction for neural NLP,but it has yet to be well addressed.In the reasoning section,we will review reasoning mechanisms,including the knowledge,existing non-neural inference methods,and new neural inference methods.We emphasize the importance of reasoning in this paper because it is important for building interpretable and knowledgedriven neural NLP models to handle complex tasks.At the end of this paper,we will briefly outline our thoughts on the future directions of neural NLP.