检索结果-维普期刊中文期刊服务平台

期刊文献⁺

任意字段

题名或关键词

题名

关键词

文摘

作者

第一作者

机构

刊名

分类号

参考文献

作者简介

基金资助

栏目信息

共找到5篇文章

< 1 >

每页显示 20 50 100

已选择0条

导出题录引用分析

统计分析

显示方式：

文摘详细列表

相关度排序被引量排序时效性排序

CONTINUOUS TIME MARKOV DECISION PROGRAMMING WITH AVERAGE REWARD CRITERION AND UNBOUNDED REWARD RATE: 1; 作者郑少慧《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 1991年第1期6-16,共11页; This paper deals with the continuous time Markov decision programming (briefly CTMDP) withunbounded reward rate.The economic criterion is the long-run average reward. To the models withcountable state space,and compac... 展开更多; 关键词 CONTINUOUS TIME MARKOV DECISION PROGRAMMING WITH average reward CRITERION AND UNBOUNDED reward RATE CTMDP; 原文传递

Incremental Multi Step R Learning: 2; 作者胡光华吴沧浦《Journal of Beijing Institute of Technology》 EI CAS 1999年第3期245-250,共6页; Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm... 展开更多; 关键词 reinforcement learning average reward R learning Markov decision processes temporal difference learning; 在线阅读下载PDF 职称材料

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling: 3; 作者 Yu Zhao Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2021年第6期12-23,共12页; This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival... 展开更多; 关键词 reinforcement learning for average rewards infinite-horizon Markov decision process upper confidence bound queue scheduling; 在线阅读下载PDF 职称材料

Inspection strategies for quality products withrewards in a multi-stage production: 4; 作者 R.Satheesh Kumar A.Nagarajan 《Journal of Control and Decision》 EI 2023年第4期596-609,共14页; In a multi-stage manufacturing system,defective components are generated due to deteriorating machine parts and failure to install the feed load.In these circumstances,the system requires inspection counters to distin... 展开更多; 关键词 Sequential process infinitesimal operator Markov decision processes value function optimal policy optimal average reward; 原文传递

STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES 被引量：1: 5; 作者 ZHU Quanxin GUO Xianping 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2014年第5期1045-1063,共19页; This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the rewar... 展开更多; 关键词 Continuous-time Markov decision process expected average reward criterion finite-horizon optimality Polish space strong n-discount optimality; 原文传递

	题名	作者	出处	发文年	被引量	操作
1	CONTINUOUS TIME MARKOV DECISION PROGRAMMING WITH AVERAGE REWARD CRITERION AND UNBOUNDED REWARD RATE	郑少慧	《Acta Mathematicae Applicatae Sinica》 SCIE CSCD	1991	0	原文传递
2	Incremental Multi Step R Learning	胡光华吴沧浦	《Journal of Beijing Institute of Technology》 EI CAS	1999	0	在线阅读下载PDF 职称材料
3	Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling	Yu Zhao Joohyun Lee Wei Chen	《China Communications》 SCIE CSCD	2021	0	在线阅读下载PDF 职称材料
4	Inspection strategies for quality products withrewards in a multi-stage production	R.Satheesh Kumar A.Nagarajan	《Journal of Control and Decision》 EI	2023	0	原文传递
5	STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES	ZHU Quanxin GUO Xianping	《Journal of Systems Science & Complexity》 SCIE EI CSCD	2014	1	原文传递

已选择0条

导出题录引用分析

统计分析

使用帮助返回顶部