Dialogue policy learning(DPL)is a key component in a task-oriented dialogue(TOD)system.Its goal is to decide the next action of the dialogue system,given the dialogue state at each turn based on a learned dialogue pol...Dialogue policy learning(DPL)is a key component in a task-oriented dialogue(TOD)system.Its goal is to decide the next action of the dialogue system,given the dialogue state at each turn based on a learned dialogue policy.Reinforcement learning(RL)is widely used to optimize this dialogue policy.In the learning process,the user is regarded as the environment and the system as the agent.In this paper,we present an overview of the recent advances and challenges in dialogue policy from the perspective of RL.More specifically,we identify the problems and summarize corresponding solutions for RL-based dialogue policy learning.In addition,we provide a comprehensive survey of applying RL to DPL by categorizing recent methods into five basic elements in RL.We believe this survey can shed light on future research in DPL.展开更多
Lifelong learning is a focused issue explored by many scholars.After having reviewed the practices in lifelong leaning policies adopted in many countries and organizations,this paper analyzes the current situation in ...Lifelong learning is a focused issue explored by many scholars.After having reviewed the practices in lifelong leaning policies adopted in many countries and organizations,this paper analyzes the current situation in lifelong learning policies in China,thus to satisfy people's need to live and develop,fulfill spiritual world and level up the quality of life.展开更多
Driven by increasing penetration of intermittent renewable energy generation,modern power systems are promoting the integration of energy storage(ES)and advocating highresolution dynamic security constrained optimal p...Driven by increasing penetration of intermittent renewable energy generation,modern power systems are promoting the integration of energy storage(ES)and advocating highresolution dynamic security constrained optimal power flow(DSCOPF)models to exploit ES time-shifting flexibility against contingencies and respond promptly to more frequent variations in the system operating status.While pioneering research works explore different methods to solve security constrained optimal power flow(SCOPF)problems at individual time steps,real-time implementation of DSCOPF still faces challenges associated with uncertainty adaptation,complex constraint satisfaction,and computational efficiency.This paper proposes a physics-guided safe policy learning method,featuring an analytical evaluation model to provide both accurate safety and cost-efficiency evaluations.A primal-dual-based learning procedure is developed to guide policy learning,fostering prompt convergence.A spatialtemporal graph neural network is constructed to enhance perception on the spatial-temporal uncertainties and leverage policy generalization.Case studies validate the effectiveness and scalability of the proposed method in safety,cost-efficiency,and computational performance and highlight the value of enhanced perception on IEEE 39-bus and 118-bus test systems.展开更多
With the globalization of English, the macro and micro cultures of the users of English around the world interact intensively. Considering these conditions, the local and global cultural interface seems an important i...With the globalization of English, the macro and micro cultures of the users of English around the world interact intensively. Considering these conditions, the local and global cultural interface seems an important issue which needs to be clarified in the materials and books used for learning English. Thus, the focus of this study was to explore the language learning policy of the new Iranian English course book at high schools, Prospect 1, recently published and taught for a year in Iran, in light of globalization and culture. This qualitative study was conducted through carrying out semi-structured interviews. The participants of this study were 30 teachers of Ministry of Education, who had the experience of teaching Prospect I for a year and they were mostly chosen from Mashhad and the rest from other cities of Khorasan province, Iran. The interview contained four main questions which were posed to the teachers. The findings of the study indicate that the language learning policy of Iran need to pay more attention to the learners' intercultural communicative competence because it mainly attempts at teaching English language focusing on the home culture in the Iranian context. The article ends with some pedagogical implications and more recommendations for developing research studies.展开更多
Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in embodied AI.It is crucial for advancing next-generation intelligent robots and has garnered significant interes...Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in embodied AI.It is crucial for advancing next-generation intelligent robots and has garnered significant interest recently.Unlike data-driven machine learning methods,embodied learning focuses on robot learning through physical interaction with the environment and perceptual feedback,making it especially suitable for robotic manipulation.In this paper,we provide a comprehensive survey of the latest advancements in this field and categorize the existing work into three main branches:1)Embodied perceptual learning,which aims to predict object pose and affordance through various data representations;2)Embodied policy learning,which focuses on generating optimal robotic decisions using methods such as reinforcement learning and imitation learning;3)Embodied task-oriented learning,designed to optimize the robot′s performance based on the characteristics of different tasks in object grasping and manipulation.In addition,we offer an overview and discussion of public datasets,evaluation metrics,representative applications,current challenges,and potential future research directions.A project associated with this survey has been established at https://github.com/RayYoh/OCRM_survey.展开更多
Model-based reinforcement learning(RL)is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model.However,obtaining sufficiently accurate representations of environme...Model-based reinforcement learning(RL)is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model.However,obtaining sufficiently accurate representations of environmental dynamics is challenging because of uncertainties in complex systems and environments.An inaccurate environment model may degrade the sample efficiency and performance of model-based RL.Furthermore,while model-based RL can improve sample efficiency,it often still requires substantial training time to learn from scratch,potentially limiting its advantages over model-free approaches.To address these challenges,this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero.Our approach integrates traffic expert knowledge into a virtual environment model,employing the intelligent driver model(IDM)for basic dynamics and neural networks for residual dynamics,thus ensuring adaptability to complex scenarios.We propose a novel strategy that combines traditional control methods with residual RL,facilitating efficient learning and policy optimization without the need to learn from scratch.The proposed approach is applied to connected automated vehicle(CAV)trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flows.The experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared with the baseline agents in terms of sample efficiency,traffic flow smoothness and traffic mobility.展开更多
Purpose:Drawing on a study of international schools in Shanghai,this study explores how external experiences and curricula are mobilized as policy tools to inspire local educational innovations and how these experienc...Purpose:Drawing on a study of international schools in Shanghai,this study explores how external experiences and curricula are mobilized as policy tools to inspire local educational innovations and how these experiences are enacted differently by schools.Design/Approach/Methods:Based on a review of policy documents and interviews with school principals,senior management stakeholders,and teachers,this study identifies and compares the typologies of international schools in policy design and practice.Then,by deploying the network ethnography method following three key nodes,this study offers some explanations for the gaps between policy design and enactments.Findings:This study demonstrates the complex relations,interests,and struggles involved in constructing and shaping the meanings of international curricula within local education.The findings show the autonomy of policy networks and the difficulties of‘steering’them in a clear-cut way.Originality/Value:This study is one of the earliest attempts,if not the first,to experiment with the method of network ethnography in the context of China.These findings offer a nuanced account of the complex relations and ad hocery involved in policy learning.展开更多
The National Institute of Standards and Technology(NIST)has identified natural language policies as the preferred expression of policy and implicitly called for an automated translation of ABAC natural language access...The National Institute of Standards and Technology(NIST)has identified natural language policies as the preferred expression of policy and implicitly called for an automated translation of ABAC natural language access control policy(NLACP)to a machine-readable form.To study the automation process,we consider the hierarchical ABAC model as our reference model since it better reflects the requirements of real-world organizations.Therefore,this paper focuses on the questions of:how can we automatically infer the hierarchical structure of an ABAC model given NLACPs;and,how can we extract and define the set of authorization attributes based on the resulting structure.To address these questions,we propose an approach built upon recent advancements in natural language processing and machine learning techniques.For such a solution,the lack of appropriate data often poses a bottleneck.Therefore,we decouple the primary contributions of this work into:(1)developing a practical framework to extract authorization attributes of hierarchical ABAC system from natural language artifacts,and(2)generating a set of realistic synthetic natural language access control policies(NLACPs)to evaluate the proposed framework.Our experimental results are promising as we achieved-in average-an F1-score of 0.96 when extracting attributes values of subjects,and 0.91 when extracting the values of objects’attributes from natural language access control policies.展开更多
The National Institute of Standards and Technology(NIST)has identified natural language policies as the preferred expression of policy and implicitly called for an automated translation of ABAC natural language access...The National Institute of Standards and Technology(NIST)has identified natural language policies as the preferred expression of policy and implicitly called for an automated translation of ABAC natural language access control policy(NLACP)to a machine-readable form.To study the automation process,we consider the hierarchical ABAC model as our reference model since it better reflects the requirements of real-world organizations.Therefore,this paper focuses on the questions of:how can we automatically infer the hierarchical structure of an ABAC model given NLACPs;and,how can we extract and define the set of authorization attributes based on the resulting structure.To address these questions,we propose an approach built upon recent advancements in natural language processing and machine learning techniques.For such a solution,the lack of appropriate data often poses a bottleneck.Therefore,we decouple the primary contributions of this work into:(1)developing a practical framework to extract authorization attributes of hierarchical ABAC system from natural language artifacts,and(2)generating a set of realistic synthetic natural language access control policies(NLACPs)to evaluate the proposed framework.Our experimental results are promising as we achieved-in average-an F1-score of 0.96 when extracting attributes values of subjects,and 0.91 when extracting the values of objects’attributes from natural language access control policies.展开更多
基金Innovation and Technology Fund(ITF),Government of the Hong Kong Special Administrative Region(HKSAR),China(No.PRP-054-21FX).
文摘Dialogue policy learning(DPL)is a key component in a task-oriented dialogue(TOD)system.Its goal is to decide the next action of the dialogue system,given the dialogue state at each turn based on a learned dialogue policy.Reinforcement learning(RL)is widely used to optimize this dialogue policy.In the learning process,the user is regarded as the environment and the system as the agent.In this paper,we present an overview of the recent advances and challenges in dialogue policy from the perspective of RL.More specifically,we identify the problems and summarize corresponding solutions for RL-based dialogue policy learning.In addition,we provide a comprehensive survey of applying RL to DPL by categorizing recent methods into five basic elements in RL.We believe this survey can shed light on future research in DPL.
文摘Lifelong learning is a focused issue explored by many scholars.After having reviewed the practices in lifelong leaning policies adopted in many countries and organizations,this paper analyzes the current situation in lifelong learning policies in China,thus to satisfy people's need to live and develop,fulfill spiritual world and level up the quality of life.
基金funded by Science and Technology Program of State Grid“Research of Iteractive Control between Distributed Energy Resources and Mega-City Grids under Multi-constraints”(No.5700-202311602A-3-2-ZN)。
文摘Driven by increasing penetration of intermittent renewable energy generation,modern power systems are promoting the integration of energy storage(ES)and advocating highresolution dynamic security constrained optimal power flow(DSCOPF)models to exploit ES time-shifting flexibility against contingencies and respond promptly to more frequent variations in the system operating status.While pioneering research works explore different methods to solve security constrained optimal power flow(SCOPF)problems at individual time steps,real-time implementation of DSCOPF still faces challenges associated with uncertainty adaptation,complex constraint satisfaction,and computational efficiency.This paper proposes a physics-guided safe policy learning method,featuring an analytical evaluation model to provide both accurate safety and cost-efficiency evaluations.A primal-dual-based learning procedure is developed to guide policy learning,fostering prompt convergence.A spatialtemporal graph neural network is constructed to enhance perception on the spatial-temporal uncertainties and leverage policy generalization.Case studies validate the effectiveness and scalability of the proposed method in safety,cost-efficiency,and computational performance and highlight the value of enhanced perception on IEEE 39-bus and 118-bus test systems.
文摘With the globalization of English, the macro and micro cultures of the users of English around the world interact intensively. Considering these conditions, the local and global cultural interface seems an important issue which needs to be clarified in the materials and books used for learning English. Thus, the focus of this study was to explore the language learning policy of the new Iranian English course book at high schools, Prospect 1, recently published and taught for a year in Iran, in light of globalization and culture. This qualitative study was conducted through carrying out semi-structured interviews. The participants of this study were 30 teachers of Ministry of Education, who had the experience of teaching Prospect I for a year and they were mostly chosen from Mashhad and the rest from other cities of Khorasan province, Iran. The interview contained four main questions which were posed to the teachers. The findings of the study indicate that the language learning policy of Iran need to pay more attention to the learners' intercultural communicative competence because it mainly attempts at teaching English language focusing on the home culture in the Iranian context. The article ends with some pedagogical implications and more recommendations for developing research studies.
基金supported in part by the National Natural Science Foundation of China(No.62106236).
文摘Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in embodied AI.It is crucial for advancing next-generation intelligent robots and has garnered significant interest recently.Unlike data-driven machine learning methods,embodied learning focuses on robot learning through physical interaction with the environment and perceptual feedback,making it especially suitable for robotic manipulation.In this paper,we provide a comprehensive survey of the latest advancements in this field and categorize the existing work into three main branches:1)Embodied perceptual learning,which aims to predict object pose and affordance through various data representations;2)Embodied policy learning,which focuses on generating optimal robotic decisions using methods such as reinforcement learning and imitation learning;3)Embodied task-oriented learning,designed to optimize the robot′s performance based on the characteristics of different tasks in object grasping and manipulation.In addition,we offer an overview and discussion of public datasets,evaluation metrics,representative applications,current challenges,and potential future research directions.A project associated with this survey has been established at https://github.com/RayYoh/OCRM_survey.
基金University of Wisconsin-Madison's Center for Connected and Automated Transportation(CCAT),a part of the larger CCAT consortium,a USDOT Region 5 University Transportation Center funded by the U.S.Department of Transportation,Award#69A3552348305The contents of this paper reflect the views of the authors,who are responsible for the facts and the accuracy of the data presented herein,and do not necessarily reflect the official views or policies of the sponsoring organization.
文摘Model-based reinforcement learning(RL)is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model.However,obtaining sufficiently accurate representations of environmental dynamics is challenging because of uncertainties in complex systems and environments.An inaccurate environment model may degrade the sample efficiency and performance of model-based RL.Furthermore,while model-based RL can improve sample efficiency,it often still requires substantial training time to learn from scratch,potentially limiting its advantages over model-free approaches.To address these challenges,this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero.Our approach integrates traffic expert knowledge into a virtual environment model,employing the intelligent driver model(IDM)for basic dynamics and neural networks for residual dynamics,thus ensuring adaptability to complex scenarios.We propose a novel strategy that combines traditional control methods with residual RL,facilitating efficient learning and policy optimization without the need to learn from scratch.The proposed approach is applied to connected automated vehicle(CAV)trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flows.The experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared with the baseline agents in terms of sample efficiency,traffic flow smoothness and traffic mobility.
基金supported by China’s National Social Science Fund Education Youth Project entitled“Globalization and China’s Education Governance Through the Lens of Policy Networks”(grant number CGA190250).
文摘Purpose:Drawing on a study of international schools in Shanghai,this study explores how external experiences and curricula are mobilized as policy tools to inspire local educational innovations and how these experiences are enacted differently by schools.Design/Approach/Methods:Based on a review of policy documents and interviews with school principals,senior management stakeholders,and teachers,this study identifies and compares the typologies of international schools in policy design and practice.Then,by deploying the network ethnography method following three key nodes,this study offers some explanations for the gaps between policy design and enactments.Findings:This study demonstrates the complex relations,interests,and struggles involved in constructing and shaping the meanings of international curricula within local education.The findings show the autonomy of policy networks and the difficulties of‘steering’them in a clear-cut way.Originality/Value:This study is one of the earliest attempts,if not the first,to experiment with the method of network ethnography in the context of China.These findings offer a nuanced account of the complex relations and ad hocery involved in policy learning.
文摘The National Institute of Standards and Technology(NIST)has identified natural language policies as the preferred expression of policy and implicitly called for an automated translation of ABAC natural language access control policy(NLACP)to a machine-readable form.To study the automation process,we consider the hierarchical ABAC model as our reference model since it better reflects the requirements of real-world organizations.Therefore,this paper focuses on the questions of:how can we automatically infer the hierarchical structure of an ABAC model given NLACPs;and,how can we extract and define the set of authorization attributes based on the resulting structure.To address these questions,we propose an approach built upon recent advancements in natural language processing and machine learning techniques.For such a solution,the lack of appropriate data often poses a bottleneck.Therefore,we decouple the primary contributions of this work into:(1)developing a practical framework to extract authorization attributes of hierarchical ABAC system from natural language artifacts,and(2)generating a set of realistic synthetic natural language access control policies(NLACPs)to evaluate the proposed framework.Our experimental results are promising as we achieved-in average-an F1-score of 0.96 when extracting attributes values of subjects,and 0.91 when extracting the values of objects’attributes from natural language access control policies.
基金supported by Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The National Institute of Standards and Technology(NIST)has identified natural language policies as the preferred expression of policy and implicitly called for an automated translation of ABAC natural language access control policy(NLACP)to a machine-readable form.To study the automation process,we consider the hierarchical ABAC model as our reference model since it better reflects the requirements of real-world organizations.Therefore,this paper focuses on the questions of:how can we automatically infer the hierarchical structure of an ABAC model given NLACPs;and,how can we extract and define the set of authorization attributes based on the resulting structure.To address these questions,we propose an approach built upon recent advancements in natural language processing and machine learning techniques.For such a solution,the lack of appropriate data often poses a bottleneck.Therefore,we decouple the primary contributions of this work into:(1)developing a practical framework to extract authorization attributes of hierarchical ABAC system from natural language artifacts,and(2)generating a set of realistic synthetic natural language access control policies(NLACPs)to evaluate the proposed framework.Our experimental results are promising as we achieved-in average-an F1-score of 0.96 when extracting attributes values of subjects,and 0.91 when extracting the values of objects’attributes from natural language access control policies.