Multimodal perception is a foundational technology for human perception in complex environments.These environments often involve various interference conditions and sensor technical limitations that constrain the info...Multimodal perception is a foundational technology for human perception in complex environments.These environments often involve various interference conditions and sensor technical limitations that constrain the information capture capabilities of single-modality sensors.Multimodal perception addresses these by integrating complementary multisource heterogeneous information,providing a solution for perceiving complex environments.This technology spans across fields such as autonomous driving,industrial detection,biomedical engineering,and remote sensing.However,challenges arise due to multisensor misalignment,inadequate appearance forms,and perception-oriented issues,which complicate the corresponding relationship,information representation,and task-driven fusion.In this context,the advancement of artificial intelligence(AI)has driven the development of information fusion,offering a new perspective on tackling these challenges.1 AI leverages deep neural networks(DNNs)with gradient descent optimization to learn statistical regularities from multimodal data.By examining the entire process of multimodal information fusion,we can gain deeper insights into AI’s working mechanisms and enhance our understanding of AI perception in complex environments.展开更多
文摘Multimodal perception is a foundational technology for human perception in complex environments.These environments often involve various interference conditions and sensor technical limitations that constrain the information capture capabilities of single-modality sensors.Multimodal perception addresses these by integrating complementary multisource heterogeneous information,providing a solution for perceiving complex environments.This technology spans across fields such as autonomous driving,industrial detection,biomedical engineering,and remote sensing.However,challenges arise due to multisensor misalignment,inadequate appearance forms,and perception-oriented issues,which complicate the corresponding relationship,information representation,and task-driven fusion.In this context,the advancement of artificial intelligence(AI)has driven the development of information fusion,offering a new perspective on tackling these challenges.1 AI leverages deep neural networks(DNNs)with gradient descent optimization to learn statistical regularities from multimodal data.By examining the entire process of multimodal information fusion,we can gain deeper insights into AI’s working mechanisms and enhance our understanding of AI perception in complex environments.