安全开发生命周期(SDL,Security Development Lifecycle)的目标有两个:减少安全相关的设计和编码缺陷的数量,降低缺陷的严重性。SDL主要专注于这三个原则的前两项。设计安全意味着从一开始就保证设计和代码是安全的,默认安全是您...安全开发生命周期(SDL,Security Development Lifecycle)的目标有两个:减少安全相关的设计和编码缺陷的数量,降低缺陷的严重性。SDL主要专注于这三个原则的前两项。设计安全意味着从一开始就保证设计和代码是安全的,默认安全是您永远不会重视的。实际上,不可能写出百分之百正确的代码,关于此话题的更多内容见稍后对减少受攻击面的讨论。展开更多
Knowlege is important for text-related applications.In this paper,we introduce Microsoft Concept Graph,a knowledge graph engine that provides concept tagging APIs to facilitate the understanding of human languages.Mic...Knowlege is important for text-related applications.In this paper,we introduce Microsoft Concept Graph,a knowledge graph engine that provides concept tagging APIs to facilitate the understanding of human languages.Microsoft Concept Graph is built upon Probase,a universal probabilistic taxonomy consisting of instances and concepts mined from the Web.We start by introducing the construction of the knowledge graph through iterative semantic extraction and taxonomy construction procedures,which extract 2.7 million concepts from 1.68 billion Web pages.We then use conceptualization models to represent text in the concept space to empower text-related applications,such as topic search,query recommendation,Web table understanding and Ads relevance.Since the release in 2016,Microsoft Concept Graph has received more than 100,000 pageviews,2 million API calls and 3,000 registered downloads from 50,000 visitors over 64 countries.展开更多
随着大语言模型(large language models,LLMs)日益融入人类生活,成为人机交互中的关键对象,如何准确评测其价值观已经成为重要的研究课题。这不仅能够衡量其安全性,保障其在交互场景中负责任地发展,还能够帮助用户发现与其个人价值观更...随着大语言模型(large language models,LLMs)日益融入人类生活,成为人机交互中的关键对象,如何准确评测其价值观已经成为重要的研究课题。这不仅能够衡量其安全性,保障其在交互场景中负责任地发展,还能够帮助用户发现与其个人价值观更契合的模型,同时可以为实现人机交互中模型与人类价值观对齐提供关键指导信号。然而,价值观评测面临三大复杂挑战:如何定义合适的评测目标,以准确地揭示交互中复杂、多元的人类价值观;如何确保评测的有效性,现有的静态开源数据集存在数据污染风险且原本有效的测试样本随着大模型的快速演进容易失效。此外,许多现有工作只衡量模型对价值观的知识掌握,而非其在实际人机交互场景中的价值观践行能力,这导致评测结果难以真实反映用户对模型能力的需求;如何科学地度量评测结果,价值观评测通常是多维度的,评测时需要在不同价值维度间进行加权整合,并考虑不同的价值优先级。为了应对以上挑战,我们团队研究并搭建了价值观罗盘评估中心(Value Compass Benchmarks),通过3个创新模块实现科学的价值观评测。首先,基于社会科学中的人类基本价值观来定义评测目标,通过有限的核心价值维度全面揭示价值观;其次,设计了生成式动态演进评测框架,通过动态问题生成器实时生成评测样本,并采用生成式评测方法,分析模型在真实情境中的价值观体现;最后,提出一-种评测指标,通过加权整合各维度的价值观,支持个性化定制权重。我们期望该平台能够提供科学、系统的价值观评测服务,同时促进模型价值观对齐研究的发展。展开更多
Offline policy evaluation,evaluating and selecting complex policies for decision-making by only using offline datasets is important in reinforcement learning.At present,the model-based offline policy evaluation(MBOPE)...Offline policy evaluation,evaluating and selecting complex policies for decision-making by only using offline datasets is important in reinforcement learning.At present,the model-based offline policy evaluation(MBOPE)is widely welcomed because of its easy to implement and good performance.MBOPE directly approximates the unknown value of a given policy using the Monte Carlo method given the estimated transition and reward functions of the environment.Usually,multiple models are trained,and then one of them is selected to be used.However,a challenge remains in selecting an appropriate model from those trained for further use.The authors first analyse the upper bound of the difference between the approximated value and the unknown true value.Theoretical results show that this difference is related to the trajectories generated by the given policy on the learnt model and the prediction error of the transition and reward functions at these generated data points.Based on the theoretical results,a new criterion is proposed to tell which trained model is better suited for evaluating the given policy.At last,the effectiveness of the proposed criterion is demonstrated on both benchmark and synthetic offline datasets.展开更多
文摘安全开发生命周期(SDL,Security Development Lifecycle)的目标有两个:减少安全相关的设计和编码缺陷的数量,降低缺陷的严重性。SDL主要专注于这三个原则的前两项。设计安全意味着从一开始就保证设计和代码是安全的,默认安全是您永远不会重视的。实际上,不可能写出百分之百正确的代码,关于此话题的更多内容见稍后对减少受攻击面的讨论。
文摘Knowlege is important for text-related applications.In this paper,we introduce Microsoft Concept Graph,a knowledge graph engine that provides concept tagging APIs to facilitate the understanding of human languages.Microsoft Concept Graph is built upon Probase,a universal probabilistic taxonomy consisting of instances and concepts mined from the Web.We start by introducing the construction of the knowledge graph through iterative semantic extraction and taxonomy construction procedures,which extract 2.7 million concepts from 1.68 billion Web pages.We then use conceptualization models to represent text in the concept space to empower text-related applications,such as topic search,query recommendation,Web table understanding and Ads relevance.Since the release in 2016,Microsoft Concept Graph has received more than 100,000 pageviews,2 million API calls and 3,000 registered downloads from 50,000 visitors over 64 countries.
文摘随着大语言模型(large language models,LLMs)日益融入人类生活,成为人机交互中的关键对象,如何准确评测其价值观已经成为重要的研究课题。这不仅能够衡量其安全性,保障其在交互场景中负责任地发展,还能够帮助用户发现与其个人价值观更契合的模型,同时可以为实现人机交互中模型与人类价值观对齐提供关键指导信号。然而,价值观评测面临三大复杂挑战:如何定义合适的评测目标,以准确地揭示交互中复杂、多元的人类价值观;如何确保评测的有效性,现有的静态开源数据集存在数据污染风险且原本有效的测试样本随着大模型的快速演进容易失效。此外,许多现有工作只衡量模型对价值观的知识掌握,而非其在实际人机交互场景中的价值观践行能力,这导致评测结果难以真实反映用户对模型能力的需求;如何科学地度量评测结果,价值观评测通常是多维度的,评测时需要在不同价值维度间进行加权整合,并考虑不同的价值优先级。为了应对以上挑战,我们团队研究并搭建了价值观罗盘评估中心(Value Compass Benchmarks),通过3个创新模块实现科学的价值观评测。首先,基于社会科学中的人类基本价值观来定义评测目标,通过有限的核心价值维度全面揭示价值观;其次,设计了生成式动态演进评测框架,通过动态问题生成器实时生成评测样本,并采用生成式评测方法,分析模型在真实情境中的价值观体现;最后,提出一-种评测指标,通过加权整合各维度的价值观,支持个性化定制权重。我们期望该平台能够提供科学、系统的价值观评测服务,同时促进模型价值观对齐研究的发展。
文摘Offline policy evaluation,evaluating and selecting complex policies for decision-making by only using offline datasets is important in reinforcement learning.At present,the model-based offline policy evaluation(MBOPE)is widely welcomed because of its easy to implement and good performance.MBOPE directly approximates the unknown value of a given policy using the Monte Carlo method given the estimated transition and reward functions of the environment.Usually,multiple models are trained,and then one of them is selected to be used.However,a challenge remains in selecting an appropriate model from those trained for further use.The authors first analyse the upper bound of the difference between the approximated value and the unknown true value.Theoretical results show that this difference is related to the trajectories generated by the given policy on the learnt model and the prediction error of the transition and reward functions at these generated data points.Based on the theoretical results,a new criterion is proposed to tell which trained model is better suited for evaluating the given policy.At last,the effectiveness of the proposed criterion is demonstrated on both benchmark and synthetic offline datasets.