Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion...Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.展开更多
Intelligent robots are increasingly being deployed across industries ranging from manufacturing to household applications and outdoor exploration.Their autonomous obstacle avoidance capabilities in complex environment...Intelligent robots are increasingly being deployed across industries ranging from manufacturing to household applications and outdoor exploration.Their autonomous obstacle avoidance capabilities in complex environments have become a critical factor determining operational stability.Multimodal perception technology,which integrates visual,auditory,tactile,and LiDAR data,provides robots with comprehensive environmental awareness.By establishing efficient autonomous obstacle avoidance decision-making mechanisms based on this information,the system’s adaptability to challenging scenarios can be significantly enhanced.This study investigates the integration of multimodal perception with autonomous obstacle avoidance decision-making,analyzing the acquisition and processing of perceptual information,core modules and logic of decision-making mechanisms,and proposing optimization strategies for specific scenarios.The research aims to provide theoretical references for advancing autonomous obstacle avoidance technology in intelligent robots,enabling safer and more flexible movement in diverse environments.展开更多
Neuromorphic devices,inspired by the intricate architecture of the human brain,have garnered recognition for their prodigious computational speed and sophisticated parallel computing capabilities.Vision,the primary mo...Neuromorphic devices,inspired by the intricate architecture of the human brain,have garnered recognition for their prodigious computational speed and sophisticated parallel computing capabilities.Vision,the primary mode of external information acquisition in living organisms,has garnered substantial scholarly interest.Notwithstanding numerous studies simulating the retina through optical synapses,their applications remain circumscribed to single-mode perception.Moreover,the pivotal role of temperature,a fundamental regulator of biological activities,has regrettably been relegated to the periphery.To address these limitations,we proffer a neuromorphic device endowed with multimodal perception,grounded in the principles of light-modulated semiconductors.This device seamlessly accomplishes dynamic hybrid visual and thermal multimodal perception,featuring temperature-dependent paired pulse facilitation properties and adaptive storage.Crucially,our meticulous examination of transfer curves,capacitance–voltage(C–V)tests,and noise measurements provides insights into interface and bulk defects,elucidating the physical mechanisms underlying adaptive storage and other functionalities.Additionally,the device demonstrates a variety of synaptic functionalities,including filtering properties,Ebbinghaus curves,and memory applications in image recognition.Surprisingly,the digital recognition rate achieves a remarkable value of 98.8%.展开更多
This study explores the development of an AI-driven multimodal perception robotic arm,using magnetically actuated components to enhance the autonomy and quality of life for individuals with disabilities.Traditional pr...This study explores the development of an AI-driven multimodal perception robotic arm,using magnetically actuated components to enhance the autonomy and quality of life for individuals with disabilities.Traditional prosthetics often have limitations in flexibility and operability,which hinder users’ability to perform everyday tasks.The integration of magnetically actuated actuators,AI-based control,and real-time signal processing offers a promising solution to these challenges.This paper focuses on the detailed methodology,including signal acquisition,preprocessing,feature extraction,and the combination of magnetic actuation with AI algorithms for adaptive control.Through precise,flexible control strategies,the robotic arm is designed to provide natural,efficient assistance for disabled individuals,aiming to increased user independence in both daily life and rehabilitation.展开更多
Multimodal perception,pivotal for artificial intelligence(AI)systems demanding real-time decision-making and environmental adaptability,might be significantly improved through two-dimensional(2D)piezo-ferro-opto-elect...Multimodal perception,pivotal for artificial intelligence(AI)systems demanding real-time decision-making and environmental adaptability,might be significantly improved through two-dimensional(2D)piezo-ferro-opto-electronic(PFOE)semiconductors,like,NbOX_(2)(X=Cl,Br,I).Such improvement may enable in-sensor fusion of sense organ signals(e.g.,vision,audition,gustation,and olfaction)within a single functional component,overcoming limitations of conventional discrete sensor architectures.Such function cohesion,combined with their recently uncovered properties,not only provides a robust foundation for expanding sensory modalities and developing novel mechanisms to establish an all-in-one multimodal perception platform,but also paves the way for multisensory-integrated artificial systems beyond human sensory systems.This single-component system employing such PFOE semiconductors substantially mitigates intermodule communication latency while boosting integration density of information,thereby circumventing persistent inefficiencies in AI hardware architectures for real-time applications,such as embodied robotics and immersive human–machine interfaces.This fusion of multimodal perception and computation,enabled by multiphysics coupling of 2D NbOX_(2),drives AI systems toward biological-grade efficiency while maintaining environmental adaptability,representing a critical leap toward autonomous intelligence operating in dynamic real-world settings.展开更多
The ability of bipedal humanoid robots to walk adaptively on varied terrain is a critical challenge for practical applications,drawing substantial attention from academic and industrial research communities in recent ...The ability of bipedal humanoid robots to walk adaptively on varied terrain is a critical challenge for practical applications,drawing substantial attention from academic and industrial research communities in recent years.Traditional model-based locomotion control methods have high modeling complexity,especially in complex terrain environments,making locomotion stability difficult to ensure.Reinforcement learning offers an end-to-end solution for locomotion control in humanoid robots.This approach typically relies solely on proprioceptive sensing to generate control policies,often resulting in increased robot body collisions during practical applications.Excessive collisions can damage the biped robot hardware,and more critically,the absence of multimodal input,such as vision,limits the robot’s ability to perceive environmental context and adjust its gait trajectory promptly.This lack of multimodal perception also hampers stability and robustness during tasks.In this paper,visual information is added to the locomotion control problem of humanoid robot,and a three-stage multi-objective constraint policy distillation optimization algorithm is innovantly proposed.The expert policies of different terrains to meet the requirements of gait aesthetics are trained through reinforcement learning,and these expert policies are distilled into student through policy distillation.Experimental results demonstrate a significant reduction in collision rates when utilizing a control policy that integrates multimodal perception,especially in challenging terrains like stairs,thresholds,and mixed surfaces.This advancement supports the practical deployment of bipedal humanoid robots.展开更多
Multimodal perception is a foundational technology for human perception in complex environments.These environments often involve various interference conditions and sensor technical limitations that constrain the info...Multimodal perception is a foundational technology for human perception in complex environments.These environments often involve various interference conditions and sensor technical limitations that constrain the information capture capabilities of single-modality sensors.Multimodal perception addresses these by integrating complementary multisource heterogeneous information,providing a solution for perceiving complex environments.This technology spans across fields such as autonomous driving,industrial detection,biomedical engineering,and remote sensing.However,challenges arise due to multisensor misalignment,inadequate appearance forms,and perception-oriented issues,which complicate the corresponding relationship,information representation,and task-driven fusion.In this context,the advancement of artificial intelligence(AI)has driven the development of information fusion,offering a new perspective on tackling these challenges.1 AI leverages deep neural networks(DNNs)with gradient descent optimization to learn statistical regularities from multimodal data.By examining the entire process of multimodal information fusion,we can gain deeper insights into AI’s working mechanisms and enhance our understanding of AI perception in complex environments.展开更多
文摘Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.
文摘Intelligent robots are increasingly being deployed across industries ranging from manufacturing to household applications and outdoor exploration.Their autonomous obstacle avoidance capabilities in complex environments have become a critical factor determining operational stability.Multimodal perception technology,which integrates visual,auditory,tactile,and LiDAR data,provides robots with comprehensive environmental awareness.By establishing efficient autonomous obstacle avoidance decision-making mechanisms based on this information,the system’s adaptability to challenging scenarios can be significantly enhanced.This study investigates the integration of multimodal perception with autonomous obstacle avoidance decision-making,analyzing the acquisition and processing of perceptual information,core modules and logic of decision-making mechanisms,and proposing optimization strategies for specific scenarios.The research aims to provide theoretical references for advancing autonomous obstacle avoidance technology in intelligent robots,enabling safer and more flexible movement in diverse environments.
基金the financial support given by National Natural Science Foundation of China(52227808,62202285)the National Science Foundation for Distinguished Young Scholars of China(51725505)+1 种基金the Development Fund for Shanghai Talents(No.2021003)Shanghai Collaborative Innovation Center of Intelligent Perception Chip Technology。
文摘Neuromorphic devices,inspired by the intricate architecture of the human brain,have garnered recognition for their prodigious computational speed and sophisticated parallel computing capabilities.Vision,the primary mode of external information acquisition in living organisms,has garnered substantial scholarly interest.Notwithstanding numerous studies simulating the retina through optical synapses,their applications remain circumscribed to single-mode perception.Moreover,the pivotal role of temperature,a fundamental regulator of biological activities,has regrettably been relegated to the periphery.To address these limitations,we proffer a neuromorphic device endowed with multimodal perception,grounded in the principles of light-modulated semiconductors.This device seamlessly accomplishes dynamic hybrid visual and thermal multimodal perception,featuring temperature-dependent paired pulse facilitation properties and adaptive storage.Crucially,our meticulous examination of transfer curves,capacitance–voltage(C–V)tests,and noise measurements provides insights into interface and bulk defects,elucidating the physical mechanisms underlying adaptive storage and other functionalities.Additionally,the device demonstrates a variety of synaptic functionalities,including filtering properties,Ebbinghaus curves,and memory applications in image recognition.Surprisingly,the digital recognition rate achieves a remarkable value of 98.8%.
基金supported in part by the National Key R&D Program of China(No.2023YFC3305600)the Joint Fund of Ministry of Education of China(Nos.8091B022149,8091B02072404)+1 种基金the National Natural Science Foundation of China(Nos.62132016,62171343)generated using MATLAB(R2023a)and Adobe Illustrator(CC 2024).
文摘This study explores the development of an AI-driven multimodal perception robotic arm,using magnetically actuated components to enhance the autonomy and quality of life for individuals with disabilities.Traditional prosthetics often have limitations in flexibility and operability,which hinder users’ability to perform everyday tasks.The integration of magnetically actuated actuators,AI-based control,and real-time signal processing offers a promising solution to these challenges.This paper focuses on the detailed methodology,including signal acquisition,preprocessing,feature extraction,and the combination of magnetic actuation with AI algorithms for adaptive control.Through precise,flexible control strategies,the robotic arm is designed to provide natural,efficient assistance for disabled individuals,aiming to increased user independence in both daily life and rehabilitation.
基金supported by the National Key R&D Program of China(2021YFA1200501 and 2023YFE0210800)National Natural Science Foundation of China(U22A20137 and U21A2069)+4 种基金Natural Science Foundation of Hubei Province(2024AFE009)Natural Science Foundation of Guangdong Province(2025A1515011072)Shenzhen Science and Technology Innovation Program(JCYJ20240813153403005,GJHZ20210705142542015,JCYJ20220530160811027,and CJGJZD20240729143104006)Basic Research Support Program of HUST(2025BRA006)Special Zone Program of Wuhan Natural Science Foundation.
文摘Multimodal perception,pivotal for artificial intelligence(AI)systems demanding real-time decision-making and environmental adaptability,might be significantly improved through two-dimensional(2D)piezo-ferro-opto-electronic(PFOE)semiconductors,like,NbOX_(2)(X=Cl,Br,I).Such improvement may enable in-sensor fusion of sense organ signals(e.g.,vision,audition,gustation,and olfaction)within a single functional component,overcoming limitations of conventional discrete sensor architectures.Such function cohesion,combined with their recently uncovered properties,not only provides a robust foundation for expanding sensory modalities and developing novel mechanisms to establish an all-in-one multimodal perception platform,but also paves the way for multisensory-integrated artificial systems beyond human sensory systems.This single-component system employing such PFOE semiconductors substantially mitigates intermodule communication latency while boosting integration density of information,thereby circumventing persistent inefficiencies in AI hardware architectures for real-time applications,such as embodied robotics and immersive human–machine interfaces.This fusion of multimodal perception and computation,enabled by multiphysics coupling of 2D NbOX_(2),drives AI systems toward biological-grade efficiency while maintaining environmental adaptability,representing a critical leap toward autonomous intelligence operating in dynamic real-world settings.
基金supported by the National Natural Science Foundation of China(U21A20119,62103395,and 51975550).
文摘The ability of bipedal humanoid robots to walk adaptively on varied terrain is a critical challenge for practical applications,drawing substantial attention from academic and industrial research communities in recent years.Traditional model-based locomotion control methods have high modeling complexity,especially in complex terrain environments,making locomotion stability difficult to ensure.Reinforcement learning offers an end-to-end solution for locomotion control in humanoid robots.This approach typically relies solely on proprioceptive sensing to generate control policies,often resulting in increased robot body collisions during practical applications.Excessive collisions can damage the biped robot hardware,and more critically,the absence of multimodal input,such as vision,limits the robot’s ability to perceive environmental context and adjust its gait trajectory promptly.This lack of multimodal perception also hampers stability and robustness during tasks.In this paper,visual information is added to the locomotion control problem of humanoid robot,and a three-stage multi-objective constraint policy distillation optimization algorithm is innovantly proposed.The expert policies of different terrains to meet the requirements of gait aesthetics are trained through reinforcement learning,and these expert policies are distilled into student through policy distillation.Experimental results demonstrate a significant reduction in collision rates when utilizing a control policy that integrates multimodal perception,especially in challenging terrains like stairs,thresholds,and mixed surfaces.This advancement supports the practical deployment of bipedal humanoid robots.
文摘Multimodal perception is a foundational technology for human perception in complex environments.These environments often involve various interference conditions and sensor technical limitations that constrain the information capture capabilities of single-modality sensors.Multimodal perception addresses these by integrating complementary multisource heterogeneous information,providing a solution for perceiving complex environments.This technology spans across fields such as autonomous driving,industrial detection,biomedical engineering,and remote sensing.However,challenges arise due to multisensor misalignment,inadequate appearance forms,and perception-oriented issues,which complicate the corresponding relationship,information representation,and task-driven fusion.In this context,the advancement of artificial intelligence(AI)has driven the development of information fusion,offering a new perspective on tackling these challenges.1 AI leverages deep neural networks(DNNs)with gradient descent optimization to learn statistical regularities from multimodal data.By examining the entire process of multimodal information fusion,we can gain deeper insights into AI’s working mechanisms and enhance our understanding of AI perception in complex environments.