期刊文献+
共找到37篇文章
< 1 2 >
每页显示 20 50 100
Starlet:Network defense resource allocation with multi-armed bandits for cloud-edge crowd sensing in IoT
1
作者 Hui Xia Ning Huang +2 位作者 Xuecai Feng Rui Zhang Chao Liu 《Digital Communications and Networks》 SCIE CSCD 2024年第3期586-596,共11页
The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense ... The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility. 展开更多
关键词 Internet of things Defense resource sharing multi-armed bandits Defense resource allocation
在线阅读 下载PDF
Distributed Weighted Data Aggregation Algorithm in End-to-Edge Communication Networks Based on Multi-armed Bandit 被引量:1
2
作者 Yifei ZOU Senmao QI +1 位作者 Cong'an XU Dongxiao YU 《计算机科学》 CSCD 北大核心 2023年第2期13-22,共10页
As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when ... As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm. 展开更多
关键词 Weighted data aggregation End-to-edge communication multi-armed bandit Edge intelligence
在线阅读 下载PDF
Strict greedy design paradigm applied to the stochastic multi-armed bandit problem
3
作者 Joey Hong 《机床与液压》 北大核心 2015年第6期1-6,共6页
The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the... The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms. 展开更多
关键词 Greedy algorithms Allocation strategy Stochastic multi-armed bandit problem
在线阅读 下载PDF
Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem
4
作者 Wei Hu James Hu 《Natural Science》 2019年第1期17-27,共11页
Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique p... Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device. 展开更多
关键词 Continuous-Variable QUANTUM COMPUTERS QUANTUM Machine LEARNING QUANTUM Reinforcement LEARNING CONTEXTUAL multi-armed bandit PROBLEM
暂未订购
融合协同过滤的神经Bandits推荐算法 被引量:3
5
作者 张婷婷 欧阳丹彤 +1 位作者 孙成林 白洪涛 《吉林大学学报(理学版)》 CAS 北大核心 2024年第1期92-99,共8页
针对数据稀疏性和“冷启动”对协同过滤的限制以及现有的协同多臂老虎机算法不适用于非线性奖励函数的问题,提出一种融合协同过滤的神经Ba ndits推荐算法COEENet.首先,采用双神经网络结构学习预期奖励及潜在增益;其次,考虑邻居协同作用... 针对数据稀疏性和“冷启动”对协同过滤的限制以及现有的协同多臂老虎机算法不适用于非线性奖励函数的问题,提出一种融合协同过滤的神经Ba ndits推荐算法COEENet.首先,采用双神经网络结构学习预期奖励及潜在增益;其次,考虑邻居协同作用;最后,构造决策器进行最终决策.实验结果表明,该方法在累积遗憾上优于4种基线算法,推荐效果较好. 展开更多
关键词 协同过滤 多臂老虎机算法 推荐系统 冷启动
在线阅读 下载PDF
Diversity-Based Recruitment in Crowdsensing by Combinatorial Multi-Armed Bandits
6
作者 Abdalaziz Sawwan Jie Wu 《Tsinghua Science and Technology》 2025年第2期732-747,共16页
Mobile Crowdsensing(MCS)represents a transformative approach to collecting data from the environment as it utilizes the ubiquity and sensory capabilities of mobile devices with human participants.This paradigm enables... Mobile Crowdsensing(MCS)represents a transformative approach to collecting data from the environment as it utilizes the ubiquity and sensory capabilities of mobile devices with human participants.This paradigm enables scales of data collection critical for applications ranging from environmental monitoring to urban planning.However,the effective harnessing of this distributed data collection capability faces significant challenges.One of the most significant challenges is the variability in the sensing qualities of the participating devices while they are initially unknown and must be learned over time to optimize task assignments.This paper tackles the dual challenges of managing task diversity to mitigate data redundancy and optimizing task assignment amidst the inherent variability of worker performance.We introduce a novel model that dynamically adjusts task weights based on assignment frequency to promote diversity and incorporates a flexible approach to account for the different qualities of task completion,especially in scenarios with overlapping task assignments.Our strategy aims to maximize the overall weighted quality of data collected within the constraints of a predefined budget.Our strategy leverages a combinatorial multi-armed bandit framework with an upper confidence bound approach to guide decision-making.We demonstrate the efficacy of our approach through a combination of regret analysis and simulations grounded in realistic scenarios. 展开更多
关键词 diverse allocation mobile crowdsensing multi-agent systems multi-armed bandits online learning worker recruitment
原文传递
Stochastic programming based multi-arm bandit offloading strategy for internet of things
7
作者 Bin Cao Tingyong Wu Xiang Bai 《Digital Communications and Networks》 SCIE CSCD 2023年第5期1200-1211,共12页
In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from... In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading. 展开更多
关键词 Multi-access computing Internet of things OFFLOADING Stochastic programming multi-arm bandit
在线阅读 下载PDF
Mobility-Aware User Scheduling in Wireless Federated Learning with Contextual Multi-Armed Bandit
8
作者 Li Jun Sun Haiyang +4 位作者 Deng Xiumei Wei Kang Shi Long Liang Le Chen Wen 《China Communications》 2025年第11期256-272,共17页
Federated learning(FL)is an intricate and privacy-preserving technique that enables distributed mobile devices to collaboratively train a machine learning model.However,in real-world FL scenarios,the training performa... Federated learning(FL)is an intricate and privacy-preserving technique that enables distributed mobile devices to collaboratively train a machine learning model.However,in real-world FL scenarios,the training performance is affected by a combination of factors such as the mobility of user devices,limited communication and computational resources,thus making the user scheduling problem crucial.To tackle this problem,we jointly consider the user mobility,communication and computational capacities,and develop a stochastic optimization problem to minimize the convergence time.Specifically,we first establish a convergence bound on the training performance based on the heterogeneity of users’data,and then leverage this bound to derive the participation rate for each user.After deriving the user-specific participation rate,we aim to minimize the training latency by optimizing user scheduling under the constraints of the energy consumption and participation rate.Afterward,we transform this optimization problem to the contextual multi-armed bandit framework based on the Lyapunov method and solve it with the submodular reward enhanced linear upper confidence bound(SR-linUCB)algorithm.Experimental results demonstrate the superiority of our proposed algorithm on the training performance and time consumption compared with stateof-the-art algorithms for both independent and identically distributed(IID)and non-IID settings. 展开更多
关键词 contextual multi-armed bandit federated learning resource allocation upper confidence bound user scheduling
在线阅读 下载PDF
基于多臂赌博机遗传算法的无人机与卡车协同配送
9
作者 朱烨娜 刘敏 +1 位作者 赵肄江 陈萱霖 《计算机科学与探索》 北大核心 2025年第8期2261-2272,共12页
无人机与卡车协同配送新模式凭借其高效、环保、不受地形限制等优势,正在改变传统的物流配送方式。带无人机的旅行商问题(TSP-D)是上述配送新模式中的一种经典问题,比纯卡车物流配送更为复杂,需要从无人机和卡车间的协同交互中寻找最优... 无人机与卡车协同配送新模式凭借其高效、环保、不受地形限制等优势,正在改变传统的物流配送方式。带无人机的旅行商问题(TSP-D)是上述配送新模式中的一种经典问题,比纯卡车物流配送更为复杂,需要从无人机和卡车间的协同交互中寻找最优的配送组合,带来了新的挑战。提出了一种基于多臂赌博机的混合遗传算法来求解TSP-D。采用了自然数排列的染色体编码,并应用基于动态规划的精确划分方法对其解码,以生成无人机与卡车协同配送解方案。新设计了一种多臂赌博机局部搜索策略,将局部搜索算子池中的五种不同搜索算子视作赌博机的多个“臂”。先通过赌博机摇臂搜索后解方案适应值的提升程度来计算奖励,再根据ε-greedy强化学习方法计算各个“臂”被选中的概率,以便选择合适的搜索算子来增强算法的局部搜索能力。实验结果表明,提出的算法与其他主流的算法相比,在不同分布与不同规模的多数测试实例上均有更低的解方案成本。进一步的实验分析验证了多臂赌博机局部搜索策略比其他局部搜索策略具有更好的自适应能力,能显著提升算法的性能。最后,将提出的算法应用于长沙市一个实际的配送案例,展示了其现实应用效果。 展开更多
关键词 无人机卡车协同配送 带无人机的旅行商问题 混合遗传算法 多臂赌博机
在线阅读 下载PDF
基于Bandit学习的航空集群认知抗干扰信道选择 被引量:3
10
作者 仇启明 黎海涛 +1 位作者 张昊 罗佳伟 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2021年第5期20-25,共6页
为解决航空集群网络(ASNET)利用认知抗干扰频谱接入时会发生信道碰撞从而降低通信性能问题,研究了基于多臂赌博机(MAB)理论的航空认知抗干扰频域信道选择技术.首先,构建航空集群网络抗干扰信道选择MAB博弈模型,给出了准确估算动态集群... 为解决航空集群网络(ASNET)利用认知抗干扰频谱接入时会发生信道碰撞从而降低通信性能问题,研究了基于多臂赌博机(MAB)理论的航空认知抗干扰频域信道选择技术.首先,构建航空集群网络抗干扰信道选择MAB博弈模型,给出了准确估算动态集群网络电台数量的算法;然后,基于此先验信息提出碰撞规避(CA)的klUCB++抗干扰信道选择策略,并进一步推导出信道碰撞次数的理论上界.仿真结果表明:所提出的CA kl-UCB++抗干扰信道选择策略降低了电台频谱接入的碰撞概率和累积悔值(regret),能够有效提高航空集群网络的频域抗干扰通信性能. 展开更多
关键词 航空集群网络 信道选择 认知抗干扰 kl-UCB++算法 多臂赌博机模型
原文传递
利用Bandit算法解决推荐系统E&E问题 被引量:1
11
作者 高海宾 《韶关学院学报》 2017年第9期22-26,共5页
当前推荐系统开发应用过程中普遍存在着E&E问题,笔者指出了推荐系统中E&E问题的产生和分类,提出用Bandit算法解决这一问题的思路,重点探讨Bandit算法的数学模型和用UCB策略建立的Bandit算法模型,用MATLAB编写了核心仿真程序,并... 当前推荐系统开发应用过程中普遍存在着E&E问题,笔者指出了推荐系统中E&E问题的产生和分类,提出用Bandit算法解决这一问题的思路,重点探讨Bandit算法的数学模型和用UCB策略建立的Bandit算法模型,用MATLAB编写了核心仿真程序,并指出了这种算法模型存在的优点和不足. 展开更多
关键词 bandit算法 推荐系统 E&E问题
在线阅读 下载PDF
基于Gauss过程的连续值老虎机模型算法应用
12
作者 张慧铭 周鹏杰 王磊 《数学建模及其应用》 2025年第3期35-43,共9页
在机器学习与AI领域中,连续值老虎机模型作为一种黑箱随机优化模型,与传统老虎机问题类似,旨在探索与利用之间实现精妙的权衡.探索通过在连续动作空间选取样本点,揭示奖励函数的随机特性;利用基于现有信息,选择能够最大化预期收益的动作... 在机器学习与AI领域中,连续值老虎机模型作为一种黑箱随机优化模型,与传统老虎机问题类似,旨在探索与利用之间实现精妙的权衡.探索通过在连续动作空间选取样本点,揭示奖励函数的随机特性;利用基于现有信息,选择能够最大化预期收益的动作.本文创新性地将多臂老虎机极大极小Thompson采样算法(MOTS)引入离散化连续值老虎机模型,并基于厂房温度与高速公路数据进行实证分析.结果表明,对于离散化连续值老虎机模型,在长时间范围内,MOTS在平均遗憾上优于主流的Gauss过程-置信上界(GP-UCB)与Gauss过程-Thompson采样(GP-TS)算法;在短时间范围内,GP-TS表现优于GP-UCB,而MOTS相对逊色.真实数据场景不仅检验了老虎机算法效能,更彰显了统计学原理在强化学习中的深刻应用. 展开更多
关键词 强化学习 多臂老虎机模型 探索与利用 黑箱优化模型 置信上界算法 极大极小Thompson算法
在线阅读 下载PDF
基于Bandit反馈的自适应量化分布式在线镜像下降算法 被引量:1
13
作者 谢俊如 高文华 谢奕彬 《控制理论与应用》 EI CAS CSCD 北大核心 2023年第10期1774-1782,共9页
多智能体系统的在线分布式优化常用于处理动态环境下的优化问题,节点间需要实时传输数据流.在很多情况下,各节点无法获取个体目标函数的全部信息(包括梯度信息),并且节点间信息传输存在一定的通信约束.考虑到非欧投影意义下的镜像下降... 多智能体系统的在线分布式优化常用于处理动态环境下的优化问题,节点间需要实时传输数据流.在很多情况下,各节点无法获取个体目标函数的全部信息(包括梯度信息),并且节点间信息传输存在一定的通信约束.考虑到非欧投影意义下的镜像下降算法在处理高维数据和大规模在线学习上的优势,本文使用个体目标函数在两点处的函数值信息对缺失的梯度信息进行估计,并且根据镜像下降算法的性质设计自适应量化器,提出基于Bandit反馈的自适应量化分布式在线镜像下降算法.然后分析了量化误差界和Regret界的关系,适当选择参数可得所提算法的Regret界为O(√T).最后,通过数值仿真验证了算法和理论结果的有效性. 展开更多
关键词 镜像下降算法 多智能体系统 优化 量化 bandit反馈
在线阅读 下载PDF
融合用户聚类与Bandits算法的微博推荐模型 被引量:1
14
作者 何羽丰 徐建民 张彬 《小型微型计算机系统》 CSCD 北大核心 2022年第10期2122-2130,共9页
针对微博推荐系统中存在的新用户冷启动和数据稀疏性问题,提出一种微博推荐模型.该模型通过重要用户聚类和普通用户分类构建完整用户类,基于类兴趣表征普通用户兴趣,利用Bandits算法为完整用户类中的普通用户产生微博推荐列表,根据普通... 针对微博推荐系统中存在的新用户冷启动和数据稀疏性问题,提出一种微博推荐模型.该模型通过重要用户聚类和普通用户分类构建完整用户类,基于类兴趣表征普通用户兴趣,利用Bandits算法为完整用户类中的普通用户产生微博推荐列表,根据普通用户对推荐列表的反馈更新其所属完整用户类的历史数据,合理应对新用户冷启动,降低了数据稀疏度,实现了较为准确的微博推荐,为微博推荐模型的构建提供了新的思路.实验结果表明,该模型能够推荐给用户感兴趣的博文,推荐效果较现有随机探索类算法、置信区间类算法和概率匹配类算法分别最低提高5.62%、5.43%和33.37%. 展开更多
关键词 微博推荐 用户聚类 bandits算法 冷启动 数据稀疏
在线阅读 下载PDF
面向不平衡类的联邦学习客户端智能选择算法 被引量:3
15
作者 朱素霞 王云梦 +1 位作者 颜培森 孙广路 《哈尔滨理工大学学报》 CAS 北大核心 2024年第2期33-42,共10页
在联邦学习应用场景下,若客户端设备之间的数据呈现非独立同分布特征,甚至出现类不平衡的情况时,客户端本地模型的优化目标将偏离全局优化目标,从而给全局模型的性能带来巨大挑战。为解决这种数据异质性带来的挑战,通过积极选择合适的... 在联邦学习应用场景下,若客户端设备之间的数据呈现非独立同分布特征,甚至出现类不平衡的情况时,客户端本地模型的优化目标将偏离全局优化目标,从而给全局模型的性能带来巨大挑战。为解决这种数据异质性带来的挑战,通过积极选择合适的客户端子集以平衡数据分布将有助于提高模型的性能。因此,设计了一种面向不平衡类的联邦学习客户端智能选择算法—FedSIMT。该算法不借助任何辅助数据集,在保证客户端本地数据对服务器端不可见的隐私前提下,使用Tanimoto系数度量本地数据分布与目标分布之间的差异,采用强化学习领域中的组合多臂老虎机模型平衡客户端设备选择的开发和探索,在不同数据异质性类型下提高了全局模型的准确率和收敛速度。实验结果表明,该算法具有有效性。 展开更多
关键词 联邦学习 类不平衡 客户端选择算法 多臂老虎机
在线阅读 下载PDF
基于Bandit反馈的在线分布式镜面下降算法
16
作者 朱小梅 李觉友 《西南大学学报(自然科学版)》 CAS CSCD 北大核心 2022年第1期99-107,共9页
针对在线分布式优化中一类损失函数梯度信息获取困难的问题,提出一种基于Bandit反馈的在线分布式镜面下降(ODMD-B)算法.首先,推广在线分布式镜面梯度下降(ODMD)算法到免梯度的情形,提出了一种新的仅利用函数值信息来对梯度进行估计的方... 针对在线分布式优化中一类损失函数梯度信息获取困难的问题,提出一种基于Bandit反馈的在线分布式镜面下降(ODMD-B)算法.首先,推广在线分布式镜面梯度下降(ODMD)算法到免梯度的情形,提出了一种新的仅利用函数值信息来对梯度进行估计的方法即Bandit反馈,其关键在于利用损失函数值信息逼近梯度信息,能有效克服梯度信息难以获取或计算复杂的困难.然后,给出算法的收敛性分析.结果表明算法的收敛速度为O(T),其中T是迭代次数.最后,使用投资组合选择模型进行了数值仿真实验.实验结果表明,ODMD-B算法的收敛速度与已有的ODMD算法的收敛速度接近.对比ODMD算法,本文所提出算法的优点在于仅仅使用了计算花费较小的函数值信息,使其更适用于梯度信息难以获取的优化问题. 展开更多
关键词 在线学习 分布式优化 镜面下降算法 bandit反馈 Regret界
原文传递
基于Bandit反馈的分布式在线对偶平均算法
17
作者 朱小梅 《四川轻化工大学学报(自然科学版)》 CAS 2020年第3期87-93,共7页
为解决梯度信息难以获取的分布式在线优化问题,提出了一种基于Bandit反馈的分布式在线对偶平均(DODA-B)算法。首先,该算法对原始梯度信息反馈进行了改进,提出了一种新的梯度估计,即Bandit反馈,利用函数值信息去近似原损失函数的梯度信息... 为解决梯度信息难以获取的分布式在线优化问题,提出了一种基于Bandit反馈的分布式在线对偶平均(DODA-B)算法。首先,该算法对原始梯度信息反馈进行了改进,提出了一种新的梯度估计,即Bandit反馈,利用函数值信息去近似原损失函数的梯度信息,克服了求解复杂函数梯度存在的计算量大等问题。然后,给出了该算法的收敛性分析,结果表明,Regret界的收敛速度为O(Tmax{k,1-k}),其中T是最大迭代次数。最后,利用传感器网络的一个特例进行了数值模拟计算,计算结果表明,所提算法的收敛速度与现有的分布式在线对偶平均(DODA)算法的收敛速度接近。与DODA算法相比,所提出算法的优点在于只考虑了函数值信息,使其更适用于梯度信息获取困难的实际问题。 展开更多
关键词 分布式在线优化 对偶平均算法 bandit反馈 Regret界
在线阅读 下载PDF
Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation 被引量:8
18
作者 Xinyi Chen Qinran Hu +3 位作者 Qingxin Shi Xiangjun Quan Zaijun Wu Fangxing Li 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2020年第6期1160-1167,共8页
As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system f... As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users. 展开更多
关键词 HEATING ventilation and air conditioning(HVAC) load control multi-armed bandit online learning secondary frequency regulation
原文传递
基于5G的列车云边端协同计算设计与优化 被引量:1
19
作者 徐建喜 魏思雨 李宗平 《太赫兹科学与电子信息学报》 2024年第11期1199-1208,共10页
城市轨道交通对于缓解城市交通拥堵具有重要作用,城轨列车多车协同控制是近年来的研究热点。多车协同计算任务受通信的限制,存在资源分配平衡差、系统对环境变化响应慢、协同运行能力有限等问题。5G通信与移动边缘计算(MEC)的结合可有... 城市轨道交通对于缓解城市交通拥堵具有重要作用,城轨列车多车协同控制是近年来的研究热点。多车协同计算任务受通信的限制,存在资源分配平衡差、系统对环境变化响应慢、协同运行能力有限等问题。5G通信与移动边缘计算(MEC)的结合可有效改进任务处理的实时性与准确性,提高系统整体性能。本文设计了一种基于5G与MEC的城轨列车运行控制系统自主协同计算架构,根据多车协同控制任务的特征,将多车协同计算卸载中的边缘服务器选择问题建模为多臂匪徒(MAB)学习模型,并提出一种基于置信区间上限(UCB)算法的求解方案,使城轨列车多车协同控制系统的整体能耗和时延最小。仿真结果表明,本文所提出的算法模型在平均奖励、最佳选择概率、平均执行时延、加权总成本等方面具有显著的性能优势。 展开更多
关键词 多车协作 移动边缘(MEC)计算 5G网络 任务卸载 多臂匪徒(MAB)学习 置信区间上限(UCB)算法
在线阅读 下载PDF
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
20
作者 Yifan Lin Yuhao Wang Enlu Zhou 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2023年第3期267-288,共22页
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and ... In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and receives the corresponding reward.In particular,we consider mean-variance as the risk criterion,and the best arm is the one with the largest mean-variance reward.We apply the Thompson sampling algorithm for the disjoint model,and provide a comprehensive regret analysis for a variant of the proposed algorithm.For T rounds,K actions,and d-dimensional feature vectors,we prove a regret bound of O((1+ρ+1/ρ)d In T ln K/δ√dKT^(1+2∈)ln K/δ1/e)that holds with probability 1-δunder the mean-variance criterion with risk tolerance p,for any 0<ε<1/2,0<δ<1.The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem. 展开更多
关键词 multi-armed bandit CONTEXT RISK-AVERSE Thompson sampling
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部