期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Research on Multimodal AIGC Video Detection for Identifying Fake Videos Generated by Large Models
1
作者 Yong Liu Tianning Sun +2 位作者 Daofu Gong Li Di Xu Zhao 《Computers, Materials & Continua》 2025年第10期1161-1184,共24页
To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake... To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake videos by promoting the accuracy of Artificial Intelligence Generated Content(AIGC)video authenticity detection with a multimodal information fusion approach.First,a high-quality multimodal video dataset is collected and normalized,including resolution correction and frame rate unification.Next,feature extraction techniques are employed to draw out features from visual,audio,and text modalities.Subsequently,these features are fused into a multilayer perceptron and attention mechanisms-based multimodal feature matrix.Finally,the matrix is fed into a multimodal information fusion layer in order to construct and train a deep learning model.Experimental findings show that the multimodal fusion model achieves an accuracy of 93.8%for the detection of video authenticity,showing significant improvement against other unimodal models,as well as affirming better performance and resistance of the model to AIGC video authenticity detection. 展开更多
关键词 multimodal information fusion artificial intelligence generated content authenticity detection feature extraction multi-layer perceptron attention mechanism
在线阅读 下载PDF
Performance vs.Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems
2
作者 Sarah M.Kamel Mai A.Fadel +1 位作者 Lamiaa Elrefaei Shimaa I.Hassan 《Computer Modeling in Engineering & Sciences》 2025年第4期373-411,共39页
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate... Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions. 展开更多
关键词 Arabic-VQA deep learning-based VQA deep multimodal information fusion multimodal representation learning VQA of yes/no questions VQA model complexity VQA model performance performance-complexity trade-off
在线阅读 下载PDF
Towards a multilingual, multimedia and multimodal digital library platform
3
作者 黄铁军 田永鸿 +2 位作者 王春丽 史晓东 高文 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1188-1192,共5页
The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate go... The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate goal of the project, but a first step towards universal access to human knowledge. In particular, there are four challenges about the new way to analyze, process, operate, visualize and interact with digital media resource in this library. To tackle these challenges, North China Centre of Million Book Project (in Chinese Academy of Sciences) has initiated several innovative research projects in areas such as multimedia content analysis and retrieval, bilingual services, multimodal information presentation, and knowledge-based organization and services. In this keynote speech, we simply review our work in these areas, and argue that by technological cooperation with these innovation research topics, the project will develop a top-level digital library platform for the million book library. 展开更多
关键词 Digital library Million Book Project Multimedia content analysis Multilingual services multimodal information presentation Knowledge organization
在线阅读 下载PDF
Brain Tumor Segmentation Based on the Learning Statistical Texture
4
作者 Yufeng Guo Feiba Chang +2 位作者 Xiaoyu Chen Fengjun Sun Zihong Wang 《Journal of Artificial Intelligence and Technology》 2024年第2期160-168,共9页
Achieving accurate segmentation of brain tumors in Magnetic Resonance Imaging(MRI)is important for clinical diagnosis and accurate treatment,and the efficient extraction and analysis of MRI multimodal feature informat... Achieving accurate segmentation of brain tumors in Magnetic Resonance Imaging(MRI)is important for clinical diagnosis and accurate treatment,and the efficient extraction and analysis of MRI multimodal feature information is the key to achieving accurate segmentation.In this paper,we propose a multimodal information fusion method for brain tumor segmentation,aimed at achieving full utilization of multimodal information for accurate segmentation in MRI.In our method,the semantic information processing module(SIPM)and Multimodal Feature Reasoning Module(MFRM)are included:(1)SIPM is introduced to achieve free multiscale feature enhancement and extraction;(2)MFRM is constructed to process both the backbone network feature information layer and semantic feature information layer.Using extensive experiments,the proposed method is validated.The experimental results based on BraTS2018 and BraTS2019 datasets show that the method has unique advantages over existing brain tumor segmentation methods. 展开更多
关键词 brain tumor segmentation convolutional neural networks edge feature multimodal information fusion TRANSFORMER
在线阅读 下载PDF
Correlation-based identification approach for multimodal biometric fusion
5
作者 Ma Xin Jing Xiaojun 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2017年第4期34-39,50,共7页
Information fusion is a key step in multimodal biometric systems. The feature-level fusion is more effective than the score-level and decision-level method owing to the fact that the original feature set contains rich... Information fusion is a key step in multimodal biometric systems. The feature-level fusion is more effective than the score-level and decision-level method owing to the fact that the original feature set contains richer information about the biometric data. In this paper, we present a multiset generalized canonical discriminant projection (MGCDP) method for feature-level multimodal biometric information fusion, which maximizes the correlation of the intra-class features while minimizes the correlation of the between-class. In addition, the serial MGCDP (S-MGCDP) and parallel MGCDP (P-MGCDP) strategy were also proposed, which can fuse more than two kinds of biometric information, so as to achieve better identification effect. Experiments performed on various biometric databases shows that MGCDP method outperforms other state-of-the-art feature-level information fusion approaches. 展开更多
关键词 correlation analysis multimodal biometric information information fusion
原文传递
TIST: Transcriptome and Histopathological Image Integrative Analysis for Spatial Transcriptomics 被引量:1
6
作者 Yiran Shan Qian Zhang +5 位作者 Wenbo Guo Yanhong Wu Yuxin Miao Hongyi Xin Qiuyu Lian Jin Gu 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2022年第5期974-988,共15页
Sequencing-based spatial transcriptomics(ST)is an emerging technology to study in situ gene expression patterns at the whole-genome scale.Currently,ST data analysis is still complicated by high technical noises and lo... Sequencing-based spatial transcriptomics(ST)is an emerging technology to study in situ gene expression patterns at the whole-genome scale.Currently,ST data analysis is still complicated by high technical noises and low resolution.In addition to the transcriptomic data,matched histopathological images are usually generated for the same tissue sample along the ST experiment.The matched high-resolution histopathological images provide complementary cellular phenotypical information,providing an opportunity to mitigate the noises in ST data.We present a novel ST data analysis method called transcriptome and histopathological image integrative analysis for ST(TIST),which enables the identification of spatial clusters(SCs)and the enhancement of spatial gene expression patterns by integrative analysis of matched transcriptomic data and images.TIST devises a histopathological feature extraction method based on Markov random field(MRF)to learn the cellular features from histopathological images,and integrates them with the transcriptomic data and location information as a network,termed TIST-net.Based on TIST-net,SCs are identified by a random walk-based strategy,and gene expression patterns are enhanced by neighborhood smoothing.We benchmark TIST on both simulated datasets and 32 real samples against several state-of-the-art methods.Results show that TIST is robust to technical noises on multiple analysis tasks for sequencing-based ST data and can find interesting microstructures in different biological scenarios.TIST is available at http://lifeome.net/software/tist/and https://ngdc.cncb.ac.cn/biocode/tools/BT007317. 展开更多
关键词 Spatial transcriptomics multimodal information integration Network-based analysis Spatial cluster identification Gene expression enhancement
原文传递
A Review of Brain-Inspired Cognition and Navigation Technology for Mobile Robots 被引量:1
7
作者 Yanan Bai Shiliang Shao +5 位作者 Jin Zhang Xianzhe Zhao Chuxi Fang Ting Wang Yongliang Wang Hai Zhao 《Cyborg and Bionic Systems》 2024年第1期232-246,共15页
Brain-inspired navigation technologies combine environmental perception,spatial cognition,and target navigation to create a comprehensive navigation research system.Researchers have used various sensors to gather envi... Brain-inspired navigation technologies combine environmental perception,spatial cognition,and target navigation to create a comprehensive navigation research system.Researchers have used various sensors to gather environmental data and enhance environmental perception using multimodal information fusion.In spatial cognition,a neural network model is used to simulate the navigation mechanism of the animal brain and to construct an environmental cognition map.However,existing models face challenges in achieving high navigation success rate and efficiency.In addition,the limited incorporation of navigation mechanisms borrowed from animal brains necessitates further exploration.On the basis of the braininspired navigation process,this paper launched a systematic study on brain-inspired environment perception,brain-inspired spatial cognition,and goal-based navigation in brain-inspired navigation,which provides a new classification of brain-inspired cognition and navigation techniques and a theoretical basis for subsequent experimental studies.In the future,brain-inspired navigation technology should learn from more perfect brain-inspired mechanisms to improve its generalization ability and be simultaneously applied to large-scale distributed intelligent body cluster navigation.The multidisciplinary nature of braininspired navigation technology presents challenges,and multidisciplinary scholars must cooperate to promote the development of this technology. 展开更多
关键词 neural network model enhance environmental perception target navigation multimodal information fusionin navigation research systemresearchers spatial cognitiona environmental perceptionspatial cognitionand simulate navigation mechanism
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部