期刊文献+
共找到117篇文章
< 1 2 6 >
每页显示 20 50 100
Robust analysis of discounted Markov decision processes with uncertain transition probabilities 被引量:3
1
作者 LOU Zhen-kai HOU Fu-jun LOU Xu-ming 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2020年第4期417-436,共20页
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob... Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods. 展开更多
关键词 Markov decision processes uncertain transition probabilities robustness and sensitivity robust optimal policy value interval
在线阅读 下载PDF
Optimal Policies for Quantum Markov Decision Processes 被引量:2
2
作者 Ming-Sheng Ying Yuan Feng Sheng-Gang Ying 《International Journal of Automation and computing》 EI CSCD 2021年第3期410-421,共12页
Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper intro... Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world. 展开更多
关键词 Quantum Markov decision processes quantum machine learning reinforcement learning dynamic programming decision making
原文传递
Solving Markov Decision Processes with Downside Risk Adjustment 被引量:1
3
作者 Abhijit Gosavi Anish Parulekar 《International Journal of Automation and computing》 EI CSCD 2016年第3期235-245,共11页
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral cr... Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral criterion in which the expected rewards, either average or discounted, are maximized. There exists some literature on MDPs that takes risks into account. Much of this addresses the exponential utility (EU) function and mechanisms to penalize different forms of variance of the rewards. EU functions have some numerical deficiencies, while variance measures variability both above and below the mean rewards; the variability above mean rewards is usually beneficial and should not be penalized/avoided. As such, risk metrics that account for pre-specified targets (thresholds) for rewards have been considered in the literature, where the goal is to penalize the risks of revenues falling below those targets. Existing work on MDPs that takes targets into account seeks to minimize risks of this nature. Minimizing risks can lead to poor solutions where the risk is zero or near zero, but the average rewards are also rather low. In this paper, hence, we study a risk-averse criterion, in particular the so-called downside risk, which equals the probability of the revenues falling below a given target, where, in contrast to minimizing such risks, we only reduce this risk at the cost of slightly lowered average rewards. A solution where the risk is low and the average reward is quite high, although not at its maximum attainable value, is very attractive in practice. To be more specific, in our formulation, the objective function is the expected value of the rewards minus a scalar times the downside risk. In this setting, we analyze the infinite horizon MDP, the finite horizon MDP, and the infinite horizon semi-MDP (SMDP). We develop dynamic programming and reinforcement learning algorithms for the finite and infinite horizon. The algorithms are tested in numerical studies and show encouraging performance. 展开更多
关键词 Downside risk Markov decision processes reinforcement learning dynamic programming TARGETS thresholds.
原文传递
Variance minimization for continuous-time Markov decision processes: two approaches 被引量:1
4
作者 ZHU Quan-xin 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2010年第4期400-410,共11页
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi... This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 展开更多
关键词 Continuous-time Markov decision process Polish space variance minimization optimality equation optimality inequality.
在线阅读 下载PDF
Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach
5
作者 Qianrong Shen 《Journal of Computer and Communications》 2021年第12期80-97,共18页
In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply ... In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%. 展开更多
关键词 Ride-on-Demand Service Markov decision process Dynamic Pricing Taxi Services Route Recommendation
在线阅读 下载PDF
Rationale for Decision-Making Processes in Enhancement of Community Participation for Sustainable Mangrove Management in Lamu, Kenya
6
作者 Jamila Ahmed Bessy Kathambi Robert Kibugi 《Open Journal of Ecology》 2023年第6期409-421,共13页
Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiv... Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiveness of mangrove as an ecosystem, and thus the attendant socio-economic and governance ramifications, causes the idea of decision making to become relatively distinct from other decision making process As a result, the purpose of this research was to evaluate the impact that community engagement plays in the decision-making process as it relates to the establishment of governance norms for sustainable mangrove management in Lamu County. In this study, a correlational research design was applied, and the researchers employed a mixed techniques approach. The target population was 296 respondents. The research used questionnaires and interviews to collect data. A descriptive statistical technique was utilized to perform an inspection and analysis on the data that was gathered. The findings indicated that having awareness about governance standards is beneficial during the process of making decisions. In addition, the findings demonstrated that respondents had the impression that the decision-making process was not done properly. On the other hand, the participants pointed out the positive aspects of the decision-making process and agreed that the participation of both gender was essential for the sustainable management of mangroves. Based on these data, it appeared that full community engagement in decision-making is necessary for sustainable management of mangrove forests. 展开更多
关键词 Community Engagement SUSTAINABILITY decision Making process Lamu
在线阅读 下载PDF
A Comparative Analysis of Visualization Methods in Architecture:Employing Virtual Reality to Support the Decision-Making Process in the Architecture,Engineering,and Construction Industry
7
作者 Ahmed Redha Gheraba Debajyoti Pati +4 位作者 Clifford B.Fedler Marcelo Schmidt Michael S.Molina Ali Nejat Muge Mukaddes Darwish 《Journal of Civil Engineering and Architecture》 2023年第2期73-89,共17页
The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most cr... The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most critical design and budgetary decisions-shaping the essential traits of the project,hence emerge the need and necessity to create and integrate mechanisms to support the decision-making process.Design decisions should not be based on assumptions,past experiences,or imagination.An example of the numerous problems that are a result of uninformed design decisions is“change orders”,known as the deviation from the original scope of work,which leads to an increase of the overall cost,and changes to the construction schedule of the project.The long-term aim of this inquiry is to understand the user’s behavior,and establish evidence-based control measures,which are actions and processes that can be implemented in practice to decrease the volume and frequency of the occurrence of change orders.The current study developed a foundation for further examination by proposing potential control measures,and testing their efficiency,such as integrating Virtual Reality(VR).The specific aim was to examine the effect of different visualization methods(i.e.,VR vs.construction drawings)on,(1)how well the subjects understand the information presented about the future/planned environment;(2)the subjects’perceived confidence in what the future environment will look like;(3)the likelihood of changing the built environment;(4)design review time;and(5)accuracy in reviewing and understanding the design. 展开更多
关键词 Virtual reality construction change orders architectural visualization decision making process construction management construction technology interior environmental design
在线阅读 下载PDF
Heterogeneous Network Selection Optimization Algorithm Based on a Markov Decision Model 被引量:9
8
作者 Jianli Xie Wenjuan Gao Cuiran Li 《China Communications》 SCIE CSCD 2020年第2期40-53,共14页
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri... A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs. 展开更多
关键词 heterogeneous wireless networks Markov decision process reward function genetic algorithm simulated annealing
在线阅读 下载PDF
Alunite processing method selection using the AHP and TOPSIS approaches under fuzzy environment 被引量:4
9
作者 Alizadeh Shahab Salari Rad Mohammad Mehdi Bazzazi Abbas Aghajani 《International Journal of Mining Science and Technology》 SCIE EI CSCD 2016年第6期1017-1023,共7页
Alunite is the most important non bauxite resource for alumina. Various methods have been proposed and patented for processing alunite, but none has been performed at industrial scale and no technical,operational and ... Alunite is the most important non bauxite resource for alumina. Various methods have been proposed and patented for processing alunite, but none has been performed at industrial scale and no technical,operational and economic data is available to evaluate methods. In addition, selecting the right approach for alunite beneficiation, requires introducing a wide range of criteria and careful analysis of alternatives.In this research, after studying the existing processes, 13 methods were considered and evaluated by 14 technical, economic and environmental analyzing criteria. Due to multiplicity of processing methods and attributes, in this paper, Multi Attribute Decision Making methods were employed to examine the appropriateness of choices. The Delphi Analytical Hierarchy Process(DAHP) was used for weighting selection criteria and Fuzzy TOPSIS approach was used to determine the most profitable candidates. Among 13 studied methods, Spanish, Svoronos and Hazan methods were respectively recognized to be the best choices. 展开更多
关键词 Alunite Mineral processing methods Multi Attribute decision Making Delphi Analytical Hierarchy process (DAHP)Fuzzy TOPSIS
在线阅读 下载PDF
Meaningful Update and Repair of Markov Decision Processes for Self-Adaptive Systems
10
作者 Wen-Hua Yang Min-Xue Pan +1 位作者 Yu Zhou Zhi-Qiu Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第1期106-127,共22页
Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of s... Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of software systems,they have seen an increasing level of interest and are covering a variety of applications,e.g.,autonomous car systems and adaptive network systems.Many approaches for the construction of self-adaptive systems have been developed,and probabilistic models,such as Markov decision processes(MDPs),are one of the favoured.However,the majority of them do not deal with the problems of the underlying MDP being obsolete under new environments or unsatisfactory to the given properties.This results in the generated policies from such MDP failing to guide the self-adaptive system to run correctly and meet goals.In this article,we propose a systematic approach to updating an obsolete MDP by exploring new states and transitions and removing obsolete ones,and repairing an unsatisfactory MDP by adjusting its structure in a more meaningful way rather than arbitrarily changing the transition probabilities to values not in line with reality.Experimental results show that the MDPs updated and repaired by our approach are more competent in guiding the self-adaptive systems’correct running compared with the original ones. 展开更多
关键词 self-adaptive system Markov decision process model repair
原文传递
A Two-Level Hierarchical Markov Decision Model with Considering Interaction between Levels
11
作者 LIU Dan ZENG Wei ZHOU Hongtao 《Wuhan University Journal of Natural Sciences》 CAS 2013年第1期37-41,共5页
Decision in reality often have the characteristic of hierarchy because of the hierarchy of an organization's structure. In this paper, we propose a two-level hierarchic Markov decision model that considers the intera... Decision in reality often have the characteristic of hierarchy because of the hierarchy of an organization's structure. In this paper, we propose a two-level hierarchic Markov decision model that considers the interactions of agents in different levels and different time scales of levels. A backward induction algo- rithm is given for the model to solve the optimal policy of finite stage hierarchic decision problem. The proposed model and its algorithm are illustrated with an example about two-level hierar- chical decision problem of infrastructure maintenance. The opti- mal policy of the example is solved and the impacts of interactions between levels on decision making are analyzed. 展开更多
关键词 two-level hierarchic Markov decision processes multi-time scale backward induction
原文传递
A Model-Based, Aspiration-Led Decision Support System NY-IEDSS
12
作者 Feng ShanDept. of Aut. Control Eng. Huazhong Univ. of Sci. and Tech. Wuhan, 430074, China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1991年第2期34-43,共10页
An AI-aided simulation system embedded in a model-based, aspiration-led decision support system NY-IEDSS is reported. The NY-IEDSS is designed for mid-term development strategic study of the Nanyang Region in Henan, C... An AI-aided simulation system embedded in a model-based, aspiration-led decision support system NY-IEDSS is reported. The NY-IEDSS is designed for mid-term development strategic study of the Nanyang Region in Henan, China, and is getting beyond its prototype stage under the decision maker's (the end user) orientation. The integration of simulation model system, decision analysis and expert system for decision support in the system implementation was reviewed. The intent of the paper is to provide insight as to how system capability and acceptability can be enhanced by this integration. Moreover, emphasis is placed on problem orientation in applying the method. 展开更多
关键词 decision making process decision support system Aspiration-led DSS Intelligent front end Integrated knowledge base management system.
在线阅读 下载PDF
Double-Factored Decision Theory for Markov Decision Processes with Multiple Scenarios of the Parameters
13
作者 Cheng-Jun Hou 《Journal of the Operations Research Society of China》 2025年第2期484-514,共31页
The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenario... The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenarios in the system,and scenario expectation to formulate the expected total discounted reward of a policy.We establish a new framework named as double-factored Markov decision process(DFMDP),in which the physical state and scenario belief are shown to be the double factors serving as the sufficient statistics for the history of the decision process.Four classes of policies for the finite horizon DFMDPs are studied and it is shown that there exists a double-factored Markovian deterministic policy which is optimal among all policies.We also formulate the infinite horizon DFMDPs and present its optimality equation in this paper.An exact solution method named as double-factored backward induction for the finite horizon DFMDPs is proposed.It is utilized to find the optimal policies for the numeric examples and then compared with policies derived from other methods from the related literatures. 展开更多
关键词 Dynamic programming Markov decision process Parameter uncertainty Multiple scenarios of the parameters Double-factored Markov decision process
原文传递
Multi-constraint reinforcement learning in complex robot environments
14
作者 Sheng HAN Hengrui ZHANG +2 位作者 Hao WU Youfang LIN Kai LV 《Frontiers of Computer Science》 2025年第8期105-107,共3页
1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused... 1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused on the single-constraint issue,overlooking the more common multi-constraint setting which involves extensive computations and combinatorial optimization of multiple Lagrange multipliers. 展开更多
关键词 constrained reinforcement learning combinatorial optimization multiple lagrange multipliers constrained markov decision process complex robot environments constrained reinforcement learning crl modeled constrained markov decision process cmdp multi constraint lagrange multipliers
原文传递
Intelligent Scheduling of Virtual Power Plants Based on Deep Reinforcement Learning
15
作者 Shaowei He Wenchao Cui +3 位作者 Gang Li Hairun Xu Xiang Chen Yu Tai 《Computers, Materials & Continua》 2025年第7期861-886,共26页
The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant ... The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants. 展开更多
关键词 Deep reinforcement learning deep q-network virtual power plant lntelligent scheduling markov decision process
在线阅读 下载PDF
Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints
16
作者 Xueyan Sun Weiming Shen +3 位作者 Jiaxin Fan Birgit Vogel-Heuser Fandi Bi Chunjiang Zhang 《Engineering》 2025年第3期278-291,共14页
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr... This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality. 展开更多
关键词 Multi-objective Markov decision process Multi-agent deep reinforcement learning Proximal policy optimization Distributed hybrid flow-shop scheduling Blocking constraints
在线阅读 下载PDF
Integrated Systemfor Tube Bending Digital Manufacturing 被引量:2
17
作者 吕波 唐承统 +1 位作者 宁汝新 宋月英 《Journal of Beijing Institute of Technology》 EI CAS 2006年第2期127-132,共6页
An integrated CAD/CAPP/CAM system of tube manufacturing based on integration frame is presented. In this system, two kinds of data conventions describing tube shape are presented in tube CAD subsystem, the object-orie... An integrated CAD/CAPP/CAM system of tube manufacturing based on integration frame is presented. In this system, two kinds of data conventions describing tube shape are presented in tube CAD subsystem, the object-oriented concept and the goal-driven inference mechanism have been applied in the development of the knowledge-based CAPP subsystem and simulation of tube processing under tube bending simulation subsystem is performed based on the tube model's piecewise representation. A tube product case is considered to give the application of the integrated system, and the advantages of the system in the use of tube bending are revealed. 展开更多
关键词 numerical control (NC) tube bending CAD/CAPP/CAM integration process decision bending simulation
在线阅读 下载PDF
调节模式与决策角色对延迟选择的影响及机制:过程追踪的视角 被引量:1
18
作者 王怀勇 邢晓雪 岳思怡 《心理科学》 CSSCI CSCD 北大核心 2023年第4期913-920,共8页
从过程追踪视角出发,运用信息板技术通过3个实验探讨调节模式对延迟选择的影响及信息加工方式(加工时间、加工深度、加工模式)与决策角色在其中的中介和调节作用。结果发现:(1)调节模式影响个体的延迟选择,评估比运动模式者更倾向于延... 从过程追踪视角出发,运用信息板技术通过3个实验探讨调节模式对延迟选择的影响及信息加工方式(加工时间、加工深度、加工模式)与决策角色在其中的中介和调节作用。结果发现:(1)调节模式影响个体的延迟选择,评估比运动模式者更倾向于延迟选择;(2)加工时间在调节模式与延迟选择间起中介作用;(3)决策角色分别调节了调节模式与加工时间、延迟选择的关系,即为自我决策时,评估模式比运动模式者的加工时间更长、更倾向于延迟选择,而为他人决策时二者的偏好无显著差异;(4)决策角色调节了加工时间在调节模式与延迟选择中的中介作用,表现为有调节的中介,即为自我决策时,评估模式比运动模式者的加工时间更长致使其更倾向延迟选择,而为他人决策时加工时间的中介作用不显著。研究结果对进一步理解不同调节模式个体的延迟选择偏好及机制及如何根据不同调节模式消费者的差异制定有效的营销策略均有一定启示。 展开更多
关键词 调节模式 决策角色 延迟选择 加工时间 信息板
原文传递
Incremental Multi Step R Learning
19
作者 胡光华 吴沧浦 《Journal of Beijing Institute of Technology》 EI CAS 1999年第3期245-250,共6页
Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm... Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning. 展开更多
关键词 reinforcement learning average reward R learning Markov decision processes temporal difference learning
在线阅读 下载PDF
行政“前决策过程”中社会影响评价的立法确认
20
作者 胡戎恩 石东坡 《辽宁大学学报(哲学社会科学版)》 北大核心 2014年第1期112-119,共8页
近年来,关于前决策过程的考察和解析日益成为我国政策科学乃至诸社会科学相关研究领域中的重心之一。实践表明,如"撤点并校"的行政决策历经十年所显现出的消极后果同样在不断警示着:前决策阶段走向开放形态和协同流程,并在其... 近年来,关于前决策过程的考察和解析日益成为我国政策科学乃至诸社会科学相关研究领域中的重心之一。实践表明,如"撤点并校"的行政决策历经十年所显现出的消极后果同样在不断警示着:前决策阶段走向开放形态和协同流程,并在其中引入和确立社会影响评价的必要性、重要性和紧迫性。前决策阶段进行更加富于民主性和增强科学性的程序阶段和技术方法的运用和设置,切实提高行政决策的可接受性与可实施性,是行政决策法律规范的创设与完善进程中的一个应有着力点和效用关键点。由湖南省行政程序规定以来,政府的前决策过程中进行吸纳和涵盖社会影响评价制度的立法确认,将成为我国行政程序法治化的选项和趋势。而这需要在科学的政府决策观念指导下行政决策程序规范愈加精细化的立法设计。 展开更多
关键词 前决策过程 撤点并校 社会影响评价制度 程序规范 协同决策
在线阅读 下载PDF
上一页 1 2 6 下一页 到第
使用帮助 返回顶部