期刊文献+
共找到219篇文章
< 1 2 11 >
每页显示 20 50 100
RSG-Conformer:ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition
1
作者 Yewei Xiao Xin Du Wei Zeng 《Computers, Materials & Continua》 2026年第3期1325-1348,共24页
Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.... Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks. 展开更多
关键词 audio-visual speech recognition CONFORMER CTC sparse attention
在线阅读 下载PDF
Cultivation of Students’Critical Thinking Ability in College English Audio-Visual and Oral Teaching 被引量:1
2
作者 Hui Zhang 《Journal of Contemporary Educational Research》 2025年第6期36-41,共6页
With the increasingly prominent trend of globalization,English,as the common language of international communication,plays an increasingly important role in university education.As a key link in English teaching,the c... With the increasingly prominent trend of globalization,English,as the common language of international communication,plays an increasingly important role in university education.As a key link in English teaching,the college English audio-visual oral course not only imparts language knowledge and skills,but also shoulders the important task of cultivating students’critical thinking.As one of the essential core qualities of modern talents,critical thinking ability plays an irreplaceable role in students’in-depth understanding of English knowledge,improving intercultural communication ability and cultivating innovative thinking.This paper expounds the significance of cultivating students’critical thinking ability in college English audio-visual and oral teaching,and puts forward a series of innovative teaching strategies to cultivate students’critical thinking ability combined with practical teaching experience and cutting-edge education theory,in order to provide new ideas and practical guidance for the improvement of college English teaching quality and the development of students’comprehensive quality. 展开更多
关键词 Critical thinking ability College English audio-visual and oral teaching
在线阅读 下载PDF
Deep Audio-visual Learning:A Survey 被引量:6
3
作者 Hao Zhu Man-Di Luo +2 位作者 Rui Wang Ai-Hua Zheng Ran He 《International Journal of Automation and computing》 EI CSCD 2021年第3期351-376,共26页
Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these tw... Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems.In this paper,we provide a comprehensive survey of recent audio-visual learning development.We divide the current audio-visual learning tasks into four different subfields:audiovisual separation and localization,audio-visual correspondence learning,audio-visual generation,and audio-visual representation learning.State-of-the-art methods,as well as the remaining challenges of each subfield,are further discussed.Finally,we summarize the commonly used datasets and challenges. 展开更多
关键词 Deep audio-visual learning audio-visual separation and localization correspondence learning generative models representation learning
原文传递
Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video 被引量:1
4
作者 Liu Hua-yong, Zhou Dong-ru School of Computer,Wuhan University,Wuhan 430072, Hubei, China 《Wuhan University Journal of Natural Sciences》 CAS 2003年第04A期1070-1074,共5页
Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The p... Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust. 展开更多
关键词 news video story segmentation audio-visual features analysis text detection
在线阅读 下载PDF
AV-FDTI:Audio-visual fusion for drone threat identification 被引量:1
5
作者 Yizhuo Yang Shenghai Yuan +5 位作者 Jianfei Yang Thien Hoang Nguyen Muqing Cao Thien-Minh Nguyen Han Wang Lihua Xie 《Journal of Automation and Intelligence》 2024年第3期144-151,共8页
In response to the evolving challenges posed by small unmanned aerial vehicles(UAVs),which have the potential to transport harmful payloads or cause significant damage,we present AV-FDTI,an innovative Audio-Visual Fus... In response to the evolving challenges posed by small unmanned aerial vehicles(UAVs),which have the potential to transport harmful payloads or cause significant damage,we present AV-FDTI,an innovative Audio-Visual Fusion system designed for Drone Threat Identification.AV-FDTI leverages the fusion of audio and omnidirectional camera feature inputs,providing a comprehensive solution to enhance the precision and resilience of drone classification and 3D localization.Specifically,AV-FDTI employs a CRNN network to capture vital temporal dynamics within the audio domain and utilizes a pretrained ResNet50 model for image feature extraction.Furthermore,we adopt a visual information entropy and cross-attention-based mechanism to enhance the fusion of visual and audio data.Notably,our system is trained based on automated Leica tracking annotations,offering accurate ground truth data with millimeter-level accuracy.Comprehensive comparative evaluations demonstrate the superiority of our solution over the existing systems.In our commitment to advancing this field,we will release this work as open-source code and wearable AV-FDTI design,contributing valuable resources to the research community. 展开更多
关键词 audio-visual fusion Anti-UAV Multi-modal fusion Classification 3D localization Self-attention
在线阅读 下载PDF
A Review on Audio-visual Translation Studies
6
作者 李瑶 《语言与文化研究》 2008年第1期146-150,共5页
This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some ligh... This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some lights on our national audio-visual practice and research.The review on the Chinese scholars’ audio-visual translation studies is to offer the potential developing direction and guidelines to the studies and aspects neglected as well.Based on the summary of relevant studies,possible topics for further studies are proposed. 展开更多
关键词 audio-visual TRANSLATION SUBTITLING DUBBING
原文传递
Audio-visual emotion recognition with multilayer boosted HMM
7
作者 吕坤 贾云得 张欣 《Journal of Beijing Institute of Technology》 EI CAS 2013年第1期89-93,共5页
Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A mod... Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising. 展开更多
关键词 emotion recognition audio-visual fusion Baum-Welch algorithm multilayer boostedHMM Wizard of Oz scenario
在线阅读 下载PDF
The Audio-Visual Performance Highlighted Craze in Chicago During Chinese New Year
8
《China & The World Cultural Exchange》 2019年第2期38-39,共2页
February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese ... February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese chime bells for the American audience at the world s top-level Buntrock Hall at Symphony Center. 展开更多
关键词 audio-visual PERFORMANCE Chicago CHINESE New YEAR
在线阅读 下载PDF
Research on National Identity Based on National Audio-Visual Works: Taking Inner Mongolia as an Example
9
作者 LIU Haitao ZHANG Pei 《Cultural and Religious Studies》 2021年第8期391-396,共6页
Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense... Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense of identity to the overall ethnic group through the influence of film and television and music,and on this basis constantly evolve a new culture in line with modern and contemporary life to further enhance their sense of belonging to the ethnic nation. 展开更多
关键词 MONGOLIAN audio-visual works national identity
在线阅读 下载PDF
Application of Task-based Teaching Method to College Audio-visual English Teaching
10
作者 Liguo Shi 《International Journal of Technology Management》 2015年第9期65-67,共3页
Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. Af... Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. After further analysis and combination with the characteristics of college English audio-visual teaching in China, it puts forward the application of task-based teaching method to college audio-visual English teaching and its steps, attempting to alleviate the avoidance phenomenon in students through task-based teaching method. 展开更多
关键词 task-based teaching method college English audio-visual English teaching
在线阅读 下载PDF
Integrating Zhuang Culture Into College English Audio-Visual Speaking Course:A Multicultural Perspective
11
作者 LUO Mei CHEN Yingzhu 《Cultural and Religious Studies》 2024年第12期801-805,共5页
Zhuang culture,a representative of the native ethnic culture of Guangxi,China,is of great significance to Chinese culture.In order to promote traditional culture,enrich the teaching content of College English Audio-Vi... Zhuang culture,a representative of the native ethnic culture of Guangxi,China,is of great significance to Chinese culture.In order to promote traditional culture,enrich the teaching content of College English Audio-Visual Speaking Course,and enhance the intercultural communication ability of college students,this paper,from a multicultural perspective,explores the classroom practices of integrating indigenous Zhuang cultural elements in College English Audio-Visual Speaking Course,providing new perspectives and reference for multicultural education in foreign languages. 展开更多
关键词 Zhuang culture College English audio-visual Speaking Course classroom practice multicultural perspective
在线阅读 下载PDF
Teaching Strategies of Visual Interpretation and Audio-visual Interpretation
12
作者 DONG Yusa 《外文科技期刊数据库(文摘版)教育科学》 2021年第3期113-117,共5页
By distinguishing the differences between audio-visual interpretation and visual interpretation, it is clear that the two belong to different categories in essence and working methods, in order to avoid misunderstandi... By distinguishing the differences between audio-visual interpretation and visual interpretation, it is clear that the two belong to different categories in essence and working methods, in order to avoid misunderstanding and confusion between the two in learning. At the same time, there are some misconceptions in their teaching methods. This paper explores the teaching methods of visual interpretation and audio-visual interpretation, which will make them more reasonable and scientific in the teaching process. 展开更多
关键词 audio-visual interpretation visual interpretation TEACHING
在线阅读 下载PDF
Prioritized MPEG-4 Audio-Visual Objects Streaming over the DiffServ
13
作者 黄天云 郑婵 《Journal of Electronic Science and Technology of China》 2005年第4期314-320,共7页
The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are e... The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the 1P DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the ‘best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content. 展开更多
关键词 video streaming quality of service (QoS) MPEG-4 audio-visual objects (AVOs) DIFFSERV PRIORITIZATION
在线阅读 下载PDF
数智时代中国神话IP国际传播的符号多维重构与联觉通感递进
14
作者 曹漪那 赵静 《编辑之友》 北大核心 2026年第1期69-76,共8页
文章以国内多部游戏和影视作品为典型案例,引入通感理论,探讨中国神话IP所承载的文化符号如何在数智转译中实现多维重构与联觉激活;这类载体如何通过多感官触动、交互式叙事与沉浸式体验,触达人类本能、激活人类共性,使域外用户在潜意... 文章以国内多部游戏和影视作品为典型案例,引入通感理论,探讨中国神话IP所承载的文化符号如何在数智转译中实现多维重构与联觉激活;这类载体如何通过多感官触动、交互式叙事与沉浸式体验,触达人类本能、激活人类共性,使域外用户在潜意识中递进式生成“感知接入—情感耦合—意义协商”的跨文化通感效应,于显性与隐性之间强化中国神话符号系统的虹吸作用和传播效能。文章借此论证:中国神话资源的数智开发与通感建构,是当今中华文明国际传播的一条新路径,对于增强文化认同与构建全球数字文明共同体具有多重价值。 展开更多
关键词 中国神话IP 数字化转译 文化通感 数字文明共同体
在线阅读 下载PDF
视听联觉在海报设计中的应用研究
15
作者 周越 杨勇 《工业设计》 2026年第4期53-56,共4页
视听联觉设计是指通过整合视觉与听觉感官丰富设计效果。在当下的视觉传达领域中,海报设计仍有着广泛的应用。然而,在实际操作过程中,常常存在过度追求形式、符号堆砌以及交互性不足等问题。尽管动态图形、声波视觉化等手段已逐渐被设... 视听联觉设计是指通过整合视觉与听觉感官丰富设计效果。在当下的视觉传达领域中,海报设计仍有着广泛的应用。然而,在实际操作过程中,常常存在过度追求形式、符号堆砌以及交互性不足等问题。尽管动态图形、声波视觉化等手段已逐渐被设计师所运用,但相关研究大多仅停留在视听结合的表面描述以及零散的应用案例中,缺乏系统性的理论框架与策略总结。研究对相关理论基础和原理进行了梳理与剖析,通过案例解析提炼出“结构-功能映射”与“情感-符号转译”两条核心设计路径,并据此构建了从原理到实践的设计策略模型,以期为海报设计创新提供理论支撑与实践框架。 展开更多
关键词 工业设计 视听联觉 海报设计 设计策略
在线阅读 下载PDF
Research on a Digital Virtual Human Lip Synchronization Optimization Algorithm
16
作者 FAN Jia-li ZHAO Si-jia SI Zhan-jun 《印刷与数字媒体技术研究》 北大核心 2026年第1期226-235,250,共11页
Lip synchronization serves as a core technology for enabling natural interactions in digital virtual humans.However,it faces challenges such as insufficient dynamic correspondence between speech and lip movements and ... Lip synchronization serves as a core technology for enabling natural interactions in digital virtual humans.However,it faces challenges such as insufficient dynamic correspondence between speech and lip movements and inadequate modeling of image details.To address these limitations,a comprehensively optimized lip synchronization framework extending the Wav2Lip architecture was proposed in this study.Firstly,based on the Wav2Lip model,a facial region extraction strategy using facial keypoints was designed,which effectively enhances the robustness of facial alignment during lip synchronization for digital virtual humans.Then,a cross-modal attention fusion module between visual and speech features was introduced to improve cross-modal information fusion,and a dynamic receptive field convolution module was developed in the generation branch to enhance the modeling performance of the lip region.Finally,experiments were conducted on the VFHQ dataset.The proposed method was compared with Wav2Lip,VideoRetalking,and DI-Net models,and its performance was evaluated using three metrics:LSE-C,CSIM,and FID.Experimental results showed that the proposed method achieves significant improvements in synchronization accuracy and image fidelity,providing an efficient and feasible solution for lip-synthesis tasks of digital virtual humans. 展开更多
关键词 Lip synchronization Digital human Cross-modal attention audio-visual synthesis
在线阅读 下载PDF
“生活世界”视角下建筑现象学实践——以四川美术学院虎溪校区图书馆为例
17
作者 李佳怡 《城市建筑空间》 2026年第1期105-107,共3页
在快速城市化进程中,建筑设计同质化及场所精神缺失成为当代建筑实践的重要挑战。建筑现象学主张回归“生活世界”重构建筑的本质价值,成为重塑地域文化内涵的重要参考。以四川美术学院虎溪校区图书馆为例,从地域基因转译与感官联觉媒... 在快速城市化进程中,建筑设计同质化及场所精神缺失成为当代建筑实践的重要挑战。建筑现象学主张回归“生活世界”重构建筑的本质价值,成为重塑地域文化内涵的重要参考。以四川美术学院虎溪校区图书馆为例,从地域基因转译与感官联觉媒介的双重视角出发,探讨场所精神重构路径。提出以场地感知、空间营造与材料表达为核心设计策略,反思功能主义的局限性,以期为同类建筑实践提供参考。 展开更多
关键词 建筑现象学 场所精神 感官联觉 图书馆设计
在线阅读 下载PDF
GAI联觉创作背景下“适当引用”条款解释进路及规则完善
18
作者 周子妍 《太原理工大学学报(社会科学版)》 2026年第1期47-57,共11页
近年来,我国生成式人工智能领域发展迅速,然而GAI使用在先作品过程中仍存在合法性困境:缺乏直接法律依据,多国司法实践尚未形成成熟经验,这与GAI发展现状并不匹配。在传统生产力向新质生产力转型的过程中,人类与GAI联觉创作引发数字社... 近年来,我国生成式人工智能领域发展迅速,然而GAI使用在先作品过程中仍存在合法性困境:缺乏直接法律依据,多国司法实践尚未形成成熟经验,这与GAI发展现状并不匹配。在传统生产力向新质生产力转型的过程中,人类与GAI联觉创作引发数字社会运行模式之变,第一性的GAI发展现状对第二性的法律规范存在现实且紧迫的需求,然而第二性的法律规范无法满足第一性的社会现实之需求。为应对此合法性困境,应返回现有制度并重新解释其原理,此时应构建“适当引用”条款的新解释进路以认可商业目的的联觉创作行为具有正当性。同时,需严格界定“适当引用”条款适用规则:在使用目的上严格证据要求、审慎突破使用目的限制;在判断引用内容比例时应综合考量质量与数量,并结合行为所致市场经济利益判断;在难以通过三步检验法检验时,可通过评估行为对公共利益与市场运行效率的增益,适当认可行为在损益的同时存在合理性。与赋权模式相比,探索“适当引用”条款的新解释进路能够为联觉创作提供层级更低、更为灵活的保护。 展开更多
关键词 适当引用 生成性联觉 生成式人工智能 合理使用
在线阅读 下载PDF
“技术—文化进化论”视野下数字视听生产的感官联觉 被引量:4
19
作者 战迪 方杰云 《编辑之友》 北大核心 2025年第2期56-63,共8页
在深度数字化生态下,视听内容生产充分调动人的感官体验,搭建起关乎技术文化创新的知识体系。文章从“技术—文化共生论”到“技术—文化进化论”理念革新出发,深入探索数字视听内容联觉式生产的技术语境、实践路径与文化特征,认为感官... 在深度数字化生态下,视听内容生产充分调动人的感官体验,搭建起关乎技术文化创新的知识体系。文章从“技术—文化共生论”到“技术—文化进化论”理念革新出发,深入探索数字视听内容联觉式生产的技术语境、实践路径与文化特征,认为感官联觉作为一种跨越种族和民族的通约性体验,在故事化叙事的框架下有助于共情与共识的构建。借由数字技术,视听内容在复刻多种感官体验的同时,也激发了感官间的联觉效应,使个体化、私密性的经验跃迁至共享性、公共性层面,进而激活了感知全球化的集体转向。 展开更多
关键词 技术进化 感官联觉 数字视听 感知全球化
在线阅读 下载PDF
Synesthesia, Experiential Parts, and Conscious Unity
20
作者 Rocco J. Gennaro 《Journal of Philosophy Study》 2012年第2期73-80,共8页
Synesthesia is the "union of the senses" whereby two or more of the five senses that are normally experienced separately are involuntarily and automatically joined together in experience. For example, some synesthet... Synesthesia is the "union of the senses" whereby two or more of the five senses that are normally experienced separately are involuntarily and automatically joined together in experience. For example, some synesthetes experience a color when they hear a sound or see a letter. In this paper, I examine two cases of synesthesia in light of the notions of "experiential parts" and "conscious unity." I first provide some background on the unity of consciousness and the question of experiential parts. I then describe two very different cases of synesthesia. Finally, I critically examine the cases in light of two central notions of"unity." I argue that there is good reason to think that the neural "vehicles" of conscious states are distributed widely and can include multiple modalities. I also argue that some synesthetie experiences do not really enjoy the same "object unity" associated with normal vision. 展开更多
关键词 synesthesia experiential parts CONSCIOUSNESS UNITY visual perception auditory perception
在线阅读 下载PDF
上一页 1 2 11 下一页 到第
使用帮助 返回顶部