Anticipating others’actions is innate and essential in order for humans to navigate and interact well with others in dense crowds.This ability is urgently required for unmanned systems such as service robots and self...Anticipating others’actions is innate and essential in order for humans to navigate and interact well with others in dense crowds.This ability is urgently required for unmanned systems such as service robots and self-driving cars.However,existing solutions struggle to predict pedestrian anticipation accurately,because the influence of group-related social behaviors has not been well considered.While group relationships and group interactions are ubiquitous and significantly influence pedestrian anticipation,their influence is diverse and subtle,making it difficult to explicitly quantify.Here,we propose the group interaction field(GIF),a novel group-aware representation that quantifies pedestrian anticipation into a probability field of pedestrians’future locations and attention orientations.An end-to-end neural network,GIFNet,is tailored to estimate the GIF from explicit multidimensional observations.GIFNet quantifies the influence of group behaviors by formulating a group interaction graph with propagation and graph attention that is adaptive to the group size and dynamic interaction states.The experimental results show that the GIF effectively represents the change in pedestrians’anticipation under the prominent impact of group behaviors and accurately predicts pedestrians’future states.Moreover,the GIF contributes to explaining various predictions of pedestrians’behavior in different social states.The proposed GIF will eventually be able to allow unmanned systems to work in a human-like manner and comply with social norms,thereby promoting harmonious human-machine relationships.展开更多
Scalable,high-capacity,and low-power computing architecture is the primary assurance for increasingly manifold and large-scale machine learning tasks.Traditional electronic artificial agents by conventional power-hung...Scalable,high-capacity,and low-power computing architecture is the primary assurance for increasingly manifold and large-scale machine learning tasks.Traditional electronic artificial agents by conventional power-hungry processors have faced the issues of energy and scaling walls,hindering them from the sustainable performance improvement and iterative multi-task learning.Referring to another modality of light,photonic computing has been progressively applied in high-efficient neuromorphic systems.Here,we innovate a reconfigurable lifelong-learning optical neural network(L2 ONN),for highly-integrated tens-of-task machine intelligence with elaborated algorithm-hardware codesign.Benefiting from the inherent sparsity and parallelism in massive photonic connections,L2 ONN learns each single task by adaptively activating sparse photonic neuron connections in the coherent light field,while incrementally acquiring expertise on various tasks by gradually enlarging the activation.The multi-task optical features are parallelly processed by multi-spectrum representations allocated with different wavelengths.Extensive evaluations on freespace and on-chip architectures confirm that for the first time,L2 ONN avoided the catastrophic forgetting issue of photonic computing,owning versatile skills on challenging tens-of-tasks(vision classification,voice recognition,medical diagnosis,etc.)with a single model.Particularly,L2 ONN achieves more than an order of magnitude higher efficiency than the representative electronic artificial neural networks,and 14×larger capacity than existing optical neural networks while maintaining competitive performance on each individual task.The proposed photonic neuromorphic architecture points out a new form of lifelong learning scheme,permitting terminal/edge AI systems with light-speed efficiency and unprecedented scalability.展开更多
Endowed with the superior computing speed and energy efficiency,optical neural networks(ONNs)have attracted ever-growing attention in recent years.Existing optical computing architectures are mainly single-channel due...Endowed with the superior computing speed and energy efficiency,optical neural networks(ONNs)have attracted ever-growing attention in recent years.Existing optical computing architectures are mainly single-channel due to the lack of advanced optical connection and interaction operators,solving simple tasks such as hand-written digit classification,saliency detection,etc.The limited computing capacity and scalability of single-channel ONNs restrict the optical implementation of advanced machine vision.Herein,we develop Monet:a multichannel optical neural network architecture for a universal multiple-input multiple-channel optical computing based on a novel projection-interference-prediction framework where the inter-and intra-channel connections are mapped to optical interference and diffraction.In our Monet,optical interference patterns are generated by projecting and interfering the multichannel inputs in a shared domain.These patterns encoding the correspondences together with feature embeddings are iteratively produced through the projection-interference process to predict the final output optically.For the first time,Monet validates that multichannel processing properties can be optically implemented with high-efficiency,enabling real-world intelligent multichannel-processing tasks solved via optical computing,including 3D/motion detections.Extensive experiments on different scenarios demonstrate the effectiveness of Monet in handling advanced machine vision tasks with comparative accuracy as the electronic counterparts yet achieving a ten-fold improvement in computing efficiency.For intelligent computing,the trends of dealing with real-world advanced tasks are irreversible.Breaking the capacity and scalability limitations of single-channel ONN and further exploring the multichannel processing potential of wave optics,we anticipate that the proposed technique will accelerate the development of more powerful optical Al as critical support for modern advanced machine vision.展开更多
基金supported in part by the National Natural Science Foundation of China (NSFC,62125106,61860206003,and 62088102)in part by the Ministry of Science and Technology of China (2021ZD0109901)in part by the Provincial Key Research and Development Program of Zhejiang (2021C01016).
文摘Anticipating others’actions is innate and essential in order for humans to navigate and interact well with others in dense crowds.This ability is urgently required for unmanned systems such as service robots and self-driving cars.However,existing solutions struggle to predict pedestrian anticipation accurately,because the influence of group-related social behaviors has not been well considered.While group relationships and group interactions are ubiquitous and significantly influence pedestrian anticipation,their influence is diverse and subtle,making it difficult to explicitly quantify.Here,we propose the group interaction field(GIF),a novel group-aware representation that quantifies pedestrian anticipation into a probability field of pedestrians’future locations and attention orientations.An end-to-end neural network,GIFNet,is tailored to estimate the GIF from explicit multidimensional observations.GIFNet quantifies the influence of group behaviors by formulating a group interaction graph with propagation and graph attention that is adaptive to the group size and dynamic interaction states.The experimental results show that the GIF effectively represents the change in pedestrians’anticipation under the prominent impact of group behaviors and accurately predicts pedestrians’future states.Moreover,the GIF contributes to explaining various predictions of pedestrians’behavior in different social states.The proposed GIF will eventually be able to allow unmanned systems to work in a human-like manner and comply with social norms,thereby promoting harmonious human-machine relationships.
基金supported in part by Natural Science Foundation of China(NSFC)under contracts No.62205176,62125106,61860206003,62088102 and 62271283in part by Ministry of Science and Technology of China under contract No.2021ZD0109901in part by China Postdoctoral Science Foundation under contract No.2022M721889.
文摘Scalable,high-capacity,and low-power computing architecture is the primary assurance for increasingly manifold and large-scale machine learning tasks.Traditional electronic artificial agents by conventional power-hungry processors have faced the issues of energy and scaling walls,hindering them from the sustainable performance improvement and iterative multi-task learning.Referring to another modality of light,photonic computing has been progressively applied in high-efficient neuromorphic systems.Here,we innovate a reconfigurable lifelong-learning optical neural network(L2 ONN),for highly-integrated tens-of-task machine intelligence with elaborated algorithm-hardware codesign.Benefiting from the inherent sparsity and parallelism in massive photonic connections,L2 ONN learns each single task by adaptively activating sparse photonic neuron connections in the coherent light field,while incrementally acquiring expertise on various tasks by gradually enlarging the activation.The multi-task optical features are parallelly processed by multi-spectrum representations allocated with different wavelengths.Extensive evaluations on freespace and on-chip architectures confirm that for the first time,L2 ONN avoided the catastrophic forgetting issue of photonic computing,owning versatile skills on challenging tens-of-tasks(vision classification,voice recognition,medical diagnosis,etc.)with a single model.Particularly,L2 ONN achieves more than an order of magnitude higher efficiency than the representative electronic artificial neural networks,and 14×larger capacity than existing optical neural networks while maintaining competitive performance on each individual task.The proposed photonic neuromorphic architecture points out a new form of lifelong learning scheme,permitting terminal/edge AI systems with light-speed efficiency and unprecedented scalability.
基金supported in part by Ministry of Science and Technology of China under contract Na.20212D0109901,in part by Natural Science Foundation of China(NSFO under contract No.62125106,61860206003 and 62088102,in part by Bejing National Research Center for Information Science and Technology(BNRist)under Grant No.BNR2020RC01002,in part by Young Elite Scientists Sponsorship Program by CAST No.2021QNRC001.in part by Shuimu TSinghua Scholar Program,China Postdoctoral Science Foundation No.2022M711874.and Postdoctoral International Exchange Program No.YJ20210124.
文摘Endowed with the superior computing speed and energy efficiency,optical neural networks(ONNs)have attracted ever-growing attention in recent years.Existing optical computing architectures are mainly single-channel due to the lack of advanced optical connection and interaction operators,solving simple tasks such as hand-written digit classification,saliency detection,etc.The limited computing capacity and scalability of single-channel ONNs restrict the optical implementation of advanced machine vision.Herein,we develop Monet:a multichannel optical neural network architecture for a universal multiple-input multiple-channel optical computing based on a novel projection-interference-prediction framework where the inter-and intra-channel connections are mapped to optical interference and diffraction.In our Monet,optical interference patterns are generated by projecting and interfering the multichannel inputs in a shared domain.These patterns encoding the correspondences together with feature embeddings are iteratively produced through the projection-interference process to predict the final output optically.For the first time,Monet validates that multichannel processing properties can be optically implemented with high-efficiency,enabling real-world intelligent multichannel-processing tasks solved via optical computing,including 3D/motion detections.Extensive experiments on different scenarios demonstrate the effectiveness of Monet in handling advanced machine vision tasks with comparative accuracy as the electronic counterparts yet achieving a ten-fold improvement in computing efficiency.For intelligent computing,the trends of dealing with real-world advanced tasks are irreversible.Breaking the capacity and scalability limitations of single-channel ONN and further exploring the multichannel processing potential of wave optics,we anticipate that the proposed technique will accelerate the development of more powerful optical Al as critical support for modern advanced machine vision.