2024年9月中国A股市场大涨,再次点燃了全民的“炒股热”。然而,牵动股民心弦的股价涨跌——却跟许多因素息息相关。对于散户来说,除了筛选信息进行股票的买进卖出以外,通过算法模型预测也能够起到事半功倍的效果。上世纪六十年代初便有...2024年9月中国A股市场大涨,再次点燃了全民的“炒股热”。然而,牵动股民心弦的股价涨跌——却跟许多因素息息相关。对于散户来说,除了筛选信息进行股票的买进卖出以外,通过算法模型预测也能够起到事半功倍的效果。上世纪六十年代初便有了通过计算机技术进行量化交易的雏形,随着技术的迭代,通过统计学和模型构建成为量化交易的主流选择。而本论文构建了一个使用A2C (优势行动–评论家)强化学习算法的股票交易模型。利用“gym-anytrading”库创建一个股票交易环境,并使用Stable-Baselines库训练一个策略网络来学习如何在该环境中进行交易以最大化收益。该模型的数据来源于Yahoo-Finance的阿里巴巴股票信息(2022年12月至2024年9月),通过pandas-datareader库的接口获取。In September 2024, a significant surge in China’s A-share market reignited the public’s “stock trading frenzy”. However, the fluctuating stock prices that excited stock investors were closely related to many factors. For individual investors, in addition to screening information for buying and selling stocks, using an algorithm model to predict can also have a twice-as-effective effect. In the early 1960s, the embryo of quantitative trading using computer technology had appeared, and with the advancement of technology, quantitative trading based on statistics and model building became the mainstream choice. This paper constructs a stock trading model using the A2C (Advantage Actor-Critic) reinforcement learning algorithm. By using the “gym-anytrading” library to create a stock trading environment and training a policy network using the Stable-Baselines library to learn how to trade in this environment to maximize profits. The data source for the model comes from the stock information of Alibaba (2022 December to 2024 September) obtained through the interface of the pandas-datareader library.展开更多
对城市道路交通进行有效地智能化调控,可以缓解拥堵,缩短出行时间,维护社会稳定,有重要的理论价值和实际意义。为此提出顾及路口压力的多智能体Actor-Critic算法。先设计缓解路口压力的强化学习策略,构建基于深度神经网络的多智能体Acto...对城市道路交通进行有效地智能化调控,可以缓解拥堵,缩短出行时间,维护社会稳定,有重要的理论价值和实际意义。为此提出顾及路口压力的多智能体Actor-Critic算法。先设计缓解路口压力的强化学习策略,构建基于深度神经网络的多智能体Actor-Critic模型,通过Actor-Critic算法生成行动并做出判别。在SUMO(Simulation of Urban Mobility)平台上模拟交通网络,与三种传统交通信号调控算法进行对比。实验结果表明,所提方法使车辆到达数量提升了12%、车辆平均速度提升了5%,优于其它基准算法。展开更多
设备到设备(Device to Device,D2D)通信可以提升频谱利用率和系统吞吐量,但由于D2D通信存在干扰问题,资源分配难度较大。近年来,深度强化学习(Deep Reinforcement Learning,DRL)被广泛应用于蜂窝通信的资源分配。因此,提出了一种基于优...设备到设备(Device to Device,D2D)通信可以提升频谱利用率和系统吞吐量,但由于D2D通信存在干扰问题,资源分配难度较大。近年来,深度强化学习(Deep Reinforcement Learning,DRL)被广泛应用于蜂窝通信的资源分配。因此,提出了一种基于优势演员-评论员(Advantage Actor-Critic,A2C)的资源分配算法,该算法可以根据环境状态选择最佳的D2D资源分配策略。通过仿真实验验证了该算法在网络性能上的优越性,并与其他算法进行了对比,结果表明,所提算法在提高系统吞吐率方面效果最好。因此,该算法为蜂窝网络中D2D通信资源分配问题提供了一种新的解决方案,具有广泛的应用前景。展开更多
文摘2024年9月中国A股市场大涨,再次点燃了全民的“炒股热”。然而,牵动股民心弦的股价涨跌——却跟许多因素息息相关。对于散户来说,除了筛选信息进行股票的买进卖出以外,通过算法模型预测也能够起到事半功倍的效果。上世纪六十年代初便有了通过计算机技术进行量化交易的雏形,随着技术的迭代,通过统计学和模型构建成为量化交易的主流选择。而本论文构建了一个使用A2C (优势行动–评论家)强化学习算法的股票交易模型。利用“gym-anytrading”库创建一个股票交易环境,并使用Stable-Baselines库训练一个策略网络来学习如何在该环境中进行交易以最大化收益。该模型的数据来源于Yahoo-Finance的阿里巴巴股票信息(2022年12月至2024年9月),通过pandas-datareader库的接口获取。In September 2024, a significant surge in China’s A-share market reignited the public’s “stock trading frenzy”. However, the fluctuating stock prices that excited stock investors were closely related to many factors. For individual investors, in addition to screening information for buying and selling stocks, using an algorithm model to predict can also have a twice-as-effective effect. In the early 1960s, the embryo of quantitative trading using computer technology had appeared, and with the advancement of technology, quantitative trading based on statistics and model building became the mainstream choice. This paper constructs a stock trading model using the A2C (Advantage Actor-Critic) reinforcement learning algorithm. By using the “gym-anytrading” library to create a stock trading environment and training a policy network using the Stable-Baselines library to learn how to trade in this environment to maximize profits. The data source for the model comes from the stock information of Alibaba (2022 December to 2024 September) obtained through the interface of the pandas-datareader library.
文摘对城市道路交通进行有效地智能化调控,可以缓解拥堵,缩短出行时间,维护社会稳定,有重要的理论价值和实际意义。为此提出顾及路口压力的多智能体Actor-Critic算法。先设计缓解路口压力的强化学习策略,构建基于深度神经网络的多智能体Actor-Critic模型,通过Actor-Critic算法生成行动并做出判别。在SUMO(Simulation of Urban Mobility)平台上模拟交通网络,与三种传统交通信号调控算法进行对比。实验结果表明,所提方法使车辆到达数量提升了12%、车辆平均速度提升了5%,优于其它基准算法。