Artificial intelligence empowers the rapid development of autonomous intelligent systems(AISs),but it still struggles to cope with open,complex,dynamic,and uncertain environments,limiting its large-scale industrial ap...Artificial intelligence empowers the rapid development of autonomous intelligent systems(AISs),but it still struggles to cope with open,complex,dynamic,and uncertain environments,limiting its large-scale industrial application.Reliable human feedback provides a mechanism for aligning machine behavior with human values and holds promise as a new paradigm for the evolution and enhancement of machine intelligence.This paper analyzes the engineering insights from ChatGPT and elaborates on the evolution from traditional feedback to human feedback.Then,a unified framework for self-evolving intelligent driving(ID)based on human feedback is proposed.Finally,an application in the congested ramp scenario illustrates the effectiveness of the proposed framework.展开更多
Two-way feedback of human body was published in 1992. The sensation of two-way feedback of body is a special system of human reaction, which maintains and regulates symmetry and balance of human body. The human two-wa...Two-way feedback of human body was published in 1992. The sensation of two-way feedback of body is a special system of human reaction, which maintains and regulates symmetry and balance of human body. The human two-way feedback reacts to human health. For human overall health and delay decrepitude, it is necessary to pay attention to the stimulations (passive acceptance and initiative interventions) and relevant influences in human body and the stimulative effect. In this paper, the experimental research of stimulation and an example of two-way feedback in human body are given. And lay a foundation of prevention, medical treatment and hygiene of human overall health.展开更多
With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much att...With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.展开更多
One particular challenge for large‑scale software systems is anomaly detection.System logs are a straightforward and common source of information for anomaly detection.Existing log‑based anomaly detectors are unusable...One particular challenge for large‑scale software systems is anomaly detection.System logs are a straightforward and common source of information for anomaly detection.Existing log‑based anomaly detectors are unusable in real‑world industrial systems due to high false‑positive rates.In this paper,we incorporate human feedback to adjust the detection model structure to reduce false positives.We apply our approach to two industrial large‑scale systems.Results have shown that our approach performs much better than state‑of‑the-art works with 50%higher accuracy.Besides,human feedback can reduce more than 70%of false positives and greatly improve detection precision.展开更多
基于人类反馈的强化学习(reinforcement learning with human feedback,RLHF)作为当前大语言模型(large language models,LLMs)对齐的主流方法,其核心优化算法——近端策略优化(proximal policy optimization,PPO)却面临着显著的效率问...基于人类反馈的强化学习(reinforcement learning with human feedback,RLHF)作为当前大语言模型(large language models,LLMs)对齐的主流方法,其核心优化算法——近端策略优化(proximal policy optimization,PPO)却面临着显著的效率问题.PPO由生成、推理、训练3个相互关联的阶段组成,各个阶段有着不同的计算特性.然而,现有的RLHF并行框架采用相同并行策略顺序执行PPO的所有阶段,这导致以下2个问题:其一,生成阶段不能充分利用计算资源,进而影响整体效率;其二,阶段间严格串行执行,未能充分利用潜在并行性.针对上述问题,提出了一个新型RLHF并行框架——Pipe-RLHF.该框架能够自适应地根据各阶段的计算特征确定最优并行策略,突破现有阶段串行范式,采用异步PPO算法发掘阶段间的并行性.具体而言,创新性地提出了适用于PPO生成阶段的延迟批间流水线并行方法,显著提升了该阶段的计算资源利用率;再次,使用异步PPO解放阶段间的依赖关系,将阶段间并行应用到PPO的加速上;最后,针对PPO算法的整体优化,构建了分层并行策略空间,并提出了一套优化算法以实现该空间中的最优解搜索.通过在多个大语言模型上的性能评估实验表明,相较于现有方法,Pipe-RLHF最高可实现3.7倍的加速比,充分验证了该框架的有效性和优越性.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62088101.
文摘Artificial intelligence empowers the rapid development of autonomous intelligent systems(AISs),but it still struggles to cope with open,complex,dynamic,and uncertain environments,limiting its large-scale industrial application.Reliable human feedback provides a mechanism for aligning machine behavior with human values and holds promise as a new paradigm for the evolution and enhancement of machine intelligence.This paper analyzes the engineering insights from ChatGPT and elaborates on the evolution from traditional feedback to human feedback.Then,a unified framework for self-evolving intelligent driving(ID)based on human feedback is proposed.Finally,an application in the congested ramp scenario illustrates the effectiveness of the proposed framework.
文摘Two-way feedback of human body was published in 1992. The sensation of two-way feedback of body is a special system of human reaction, which maintains and regulates symmetry and balance of human body. The human two-way feedback reacts to human health. For human overall health and delay decrepitude, it is necessary to pay attention to the stimulations (passive acceptance and initiative interventions) and relevant influences in human body and the stimulative effect. In this paper, the experimental research of stimulation and an example of two-way feedback in human body are given. And lay a foundation of prevention, medical treatment and hygiene of human overall health.
文摘With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.
基金ZTE Industry-University-Institute Cooperation Funds under Grant No.20200492.
文摘One particular challenge for large‑scale software systems is anomaly detection.System logs are a straightforward and common source of information for anomaly detection.Existing log‑based anomaly detectors are unusable in real‑world industrial systems due to high false‑positive rates.In this paper,we incorporate human feedback to adjust the detection model structure to reduce false positives.We apply our approach to two industrial large‑scale systems.Results have shown that our approach performs much better than state‑of‑the-art works with 50%higher accuracy.Besides,human feedback can reduce more than 70%of false positives and greatly improve detection precision.
文摘基于人类反馈的强化学习(reinforcement learning with human feedback,RLHF)作为当前大语言模型(large language models,LLMs)对齐的主流方法,其核心优化算法——近端策略优化(proximal policy optimization,PPO)却面临着显著的效率问题.PPO由生成、推理、训练3个相互关联的阶段组成,各个阶段有着不同的计算特性.然而,现有的RLHF并行框架采用相同并行策略顺序执行PPO的所有阶段,这导致以下2个问题:其一,生成阶段不能充分利用计算资源,进而影响整体效率;其二,阶段间严格串行执行,未能充分利用潜在并行性.针对上述问题,提出了一个新型RLHF并行框架——Pipe-RLHF.该框架能够自适应地根据各阶段的计算特征确定最优并行策略,突破现有阶段串行范式,采用异步PPO算法发掘阶段间的并行性.具体而言,创新性地提出了适用于PPO生成阶段的延迟批间流水线并行方法,显著提升了该阶段的计算资源利用率;再次,使用异步PPO解放阶段间的依赖关系,将阶段间并行应用到PPO的加速上;最后,针对PPO算法的整体优化,构建了分层并行策略空间,并提出了一套优化算法以实现该空间中的最优解搜索.通过在多个大语言模型上的性能评估实验表明,相较于现有方法,Pipe-RLHF最高可实现3.7倍的加速比,充分验证了该框架的有效性和优越性.