Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically ...Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent's actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value (UDV) approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly.展开更多
A new coordination scheme for multi-robot systems is proposed. A state space model of the multi- robot system is defined and constructed in which the system's initial and goal states are included along with the task ...A new coordination scheme for multi-robot systems is proposed. A state space model of the multi- robot system is defined and constructed in which the system's initial and goal states are included along with the task definition and the system's internal and external constraints. Task accomplishment is considered a transition of the system state in its state space (SS) under the system's constraints. Therefore, if there exists a connectable path within reachable area of the SS from the initial state to the goal state, the task is realizable. The optimal strategy for the task realization under constraints is investigated and reached by searching for the optimal state transition trajectory of the robot system in the SS. Moreover, if there is no connectable path, which means the task cannot be performed Successfully, the task could be transformed to be realizable by making the initial state and the goal state connectable and finding a path connecting them in the system's SS. This might be done via adjusting the system's configuration and/or task constraints. Experiments of multi-robot formation control with obstacles in the environment are conducted and simulation results show the validity of the proposed method.展开更多
基金supported by the National Basic Research Program of China (2013CB329603)the National Natural Science Foundation of China (61375058, 71231002)+1 种基金the China Mobile Research Fund (MCM 20130351)the Ministry of Education of China and the Special Co-Construction Project of Beijing Municipal Commission of Education
文摘Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent's actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value (UDV) approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly.
基金the National Natural Science Foundation of China (60428303).
文摘A new coordination scheme for multi-robot systems is proposed. A state space model of the multi- robot system is defined and constructed in which the system's initial and goal states are included along with the task definition and the system's internal and external constraints. Task accomplishment is considered a transition of the system state in its state space (SS) under the system's constraints. Therefore, if there exists a connectable path within reachable area of the SS from the initial state to the goal state, the task is realizable. The optimal strategy for the task realization under constraints is investigated and reached by searching for the optimal state transition trajectory of the robot system in the SS. Moreover, if there is no connectable path, which means the task cannot be performed Successfully, the task could be transformed to be realizable by making the initial state and the goal state connectable and finding a path connecting them in the system's SS. This might be done via adjusting the system's configuration and/or task constraints. Experiments of multi-robot formation control with obstacles in the environment are conducted and simulation results show the validity of the proposed method.