期刊文献+

多模态认知计算 被引量:23

Multi-modal cognitive computing
原文传递
导出
摘要 人类利用视觉、听觉等多种感官理解周围环境,通过整合多种感知模态,形成对事件的整体认识.为使机器更好地模仿人类的认知能力,多模态认知计算模拟人类的“联觉”(synaesthesia),探索图像、视频、文本、语音等多模态输入的高效感知与综合理解手段,是人工智能领域的重要研究内容,也是实现“通用人工智能”的关键之一.近年来,随着多模态时空数据的海量爆发和计算能力的快速提升,国内外学者提出了大量方法,以应对日益增长的多样化需求.然而,当前的多模态认知计算仍局限于人类表观能力的模仿,缺乏认知层面的理论依据.本文从信息论角度出发,建立了认知过程的信息传递模型,结合信容(information capacity),提出了多模态认知计算能够提高机器的信息提取能力这一观点,从理论上对多模态认知计算各项任务进行了统一.进而,根据机器对多模态信息的认知模式,从多模态关联、跨模态生成和多模态协同这3个方面对现有方法进行了梳理与总结,系统地分析了其中的关键问题与解决方案.最后,结合当前阶段人工智能的发展特点,重点思考多模态认知计算领域面临的难点与挑战,并对未来发展趋势进行了深入分析与展望. The human brain perceives its surroundings through multiple sensory organs and integrates these multi-sensory perceptions to generate a comprehensive understanding.Inspired by synaesthesia,multi-modal cognitive computing endows machines with multi-sensory capabilities and has become the key to general artificial intelligence.With the explosion of multi-modal data such as image,video,text,and audio,a large number of methods have been developed to address this topic.However,the theoretical basis of multi-modal cognitive computing is still unclear.From the perspective of information theory,this paper establishes an information transmission model to profile the cognitive process.Based on the theory of information capacity,this study finds out that multi-modal cognitive computing helps machines extract more information.In this way,multi-modal cognitive computing research is unified by the same theoretical basis.Then,the development of typical tasks is reviewed and discussed,including multi-modal correlation,cross-modal generation,and multi-modal collaboration.Finally,focusing on the opportunities and challenges faced by multi-modal cognitive computing,some potential directions are discussed in depth,and several open-ended questions are considered.
作者 李学龙 Xuelong LI(School of Artificial Intelligence,Optics and Electronics(iOPEN),Northwestern Polytechnical University,Xi'an 710072,China;Key Laboratory of Intelligent Interaction and Applications(Northwestern Polytechnical University),Ministry of Industry and Information Technology,Xi'an 710072,China)
出处 《中国科学:信息科学》 CSCD 北大核心 2023年第1期1-32,共32页 Scientia Sinica(Informationis)
基金 国家自然科学基金(批准号:61871470)资助项目。
关键词 人工智能 多模态 认知计算 联觉 信容 artificial intelligence multi-modal cognitive computing synaesthesia information capacity
  • 相关文献

参考文献2

二级参考文献6

共引文献9

同被引文献164

引证文献23

二级引证文献80

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部