To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake...To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake videos by promoting the accuracy of Artificial Intelligence Generated Content(AIGC)video authenticity detection with a multimodal information fusion approach.First,a high-quality multimodal video dataset is collected and normalized,including resolution correction and frame rate unification.Next,feature extraction techniques are employed to draw out features from visual,audio,and text modalities.Subsequently,these features are fused into a multilayer perceptron and attention mechanisms-based multimodal feature matrix.Finally,the matrix is fed into a multimodal information fusion layer in order to construct and train a deep learning model.Experimental findings show that the multimodal fusion model achieves an accuracy of 93.8%for the detection of video authenticity,showing significant improvement against other unimodal models,as well as affirming better performance and resistance of the model to AIGC video authenticity detection.展开更多
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate...Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.展开更多
The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate go...The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate goal of the project, but a first step towards universal access to human knowledge. In particular, there are four challenges about the new way to analyze, process, operate, visualize and interact with digital media resource in this library. To tackle these challenges, North China Centre of Million Book Project (in Chinese Academy of Sciences) has initiated several innovative research projects in areas such as multimedia content analysis and retrieval, bilingual services, multimodal information presentation, and knowledge-based organization and services. In this keynote speech, we simply review our work in these areas, and argue that by technological cooperation with these innovation research topics, the project will develop a top-level digital library platform for the million book library.展开更多
Achieving accurate segmentation of brain tumors in Magnetic Resonance Imaging(MRI)is important for clinical diagnosis and accurate treatment,and the efficient extraction and analysis of MRI multimodal feature informat...Achieving accurate segmentation of brain tumors in Magnetic Resonance Imaging(MRI)is important for clinical diagnosis and accurate treatment,and the efficient extraction and analysis of MRI multimodal feature information is the key to achieving accurate segmentation.In this paper,we propose a multimodal information fusion method for brain tumor segmentation,aimed at achieving full utilization of multimodal information for accurate segmentation in MRI.In our method,the semantic information processing module(SIPM)and Multimodal Feature Reasoning Module(MFRM)are included:(1)SIPM is introduced to achieve free multiscale feature enhancement and extraction;(2)MFRM is constructed to process both the backbone network feature information layer and semantic feature information layer.Using extensive experiments,the proposed method is validated.The experimental results based on BraTS2018 and BraTS2019 datasets show that the method has unique advantages over existing brain tumor segmentation methods.展开更多
Information fusion is a key step in multimodal biometric systems. The feature-level fusion is more effective than the score-level and decision-level method owing to the fact that the original feature set contains rich...Information fusion is a key step in multimodal biometric systems. The feature-level fusion is more effective than the score-level and decision-level method owing to the fact that the original feature set contains richer information about the biometric data. In this paper, we present a multiset generalized canonical discriminant projection (MGCDP) method for feature-level multimodal biometric information fusion, which maximizes the correlation of the intra-class features while minimizes the correlation of the between-class. In addition, the serial MGCDP (S-MGCDP) and parallel MGCDP (P-MGCDP) strategy were also proposed, which can fuse more than two kinds of biometric information, so as to achieve better identification effect. Experiments performed on various biometric databases shows that MGCDP method outperforms other state-of-the-art feature-level information fusion approaches.展开更多
Sequencing-based spatial transcriptomics(ST)is an emerging technology to study in situ gene expression patterns at the whole-genome scale.Currently,ST data analysis is still complicated by high technical noises and lo...Sequencing-based spatial transcriptomics(ST)is an emerging technology to study in situ gene expression patterns at the whole-genome scale.Currently,ST data analysis is still complicated by high technical noises and low resolution.In addition to the transcriptomic data,matched histopathological images are usually generated for the same tissue sample along the ST experiment.The matched high-resolution histopathological images provide complementary cellular phenotypical information,providing an opportunity to mitigate the noises in ST data.We present a novel ST data analysis method called transcriptome and histopathological image integrative analysis for ST(TIST),which enables the identification of spatial clusters(SCs)and the enhancement of spatial gene expression patterns by integrative analysis of matched transcriptomic data and images.TIST devises a histopathological feature extraction method based on Markov random field(MRF)to learn the cellular features from histopathological images,and integrates them with the transcriptomic data and location information as a network,termed TIST-net.Based on TIST-net,SCs are identified by a random walk-based strategy,and gene expression patterns are enhanced by neighborhood smoothing.We benchmark TIST on both simulated datasets and 32 real samples against several state-of-the-art methods.Results show that TIST is robust to technical noises on multiple analysis tasks for sequencing-based ST data and can find interesting microstructures in different biological scenarios.TIST is available at http://lifeome.net/software/tist/and https://ngdc.cncb.ac.cn/biocode/tools/BT007317.展开更多
Brain-inspired navigation technologies combine environmental perception,spatial cognition,and target navigation to create a comprehensive navigation research system.Researchers have used various sensors to gather envi...Brain-inspired navigation technologies combine environmental perception,spatial cognition,and target navigation to create a comprehensive navigation research system.Researchers have used various sensors to gather environmental data and enhance environmental perception using multimodal information fusion.In spatial cognition,a neural network model is used to simulate the navigation mechanism of the animal brain and to construct an environmental cognition map.However,existing models face challenges in achieving high navigation success rate and efficiency.In addition,the limited incorporation of navigation mechanisms borrowed from animal brains necessitates further exploration.On the basis of the braininspired navigation process,this paper launched a systematic study on brain-inspired environment perception,brain-inspired spatial cognition,and goal-based navigation in brain-inspired navigation,which provides a new classification of brain-inspired cognition and navigation techniques and a theoretical basis for subsequent experimental studies.In the future,brain-inspired navigation technology should learn from more perfect brain-inspired mechanisms to improve its generalization ability and be simultaneously applied to large-scale distributed intelligent body cluster navigation.The multidisciplinary nature of braininspired navigation technology presents challenges,and multidisciplinary scholars must cooperate to promote the development of this technology.展开更多
文摘To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake videos by promoting the accuracy of Artificial Intelligence Generated Content(AIGC)video authenticity detection with a multimodal information fusion approach.First,a high-quality multimodal video dataset is collected and normalized,including resolution correction and frame rate unification.Next,feature extraction techniques are employed to draw out features from visual,audio,and text modalities.Subsequently,these features are fused into a multilayer perceptron and attention mechanisms-based multimodal feature matrix.Finally,the matrix is fed into a multimodal information fusion layer in order to construct and train a deep learning model.Experimental findings show that the multimodal fusion model achieves an accuracy of 93.8%for the detection of video authenticity,showing significant improvement against other unimodal models,as well as affirming better performance and resistance of the model to AIGC video authenticity detection.
文摘Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.
文摘The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate goal of the project, but a first step towards universal access to human knowledge. In particular, there are four challenges about the new way to analyze, process, operate, visualize and interact with digital media resource in this library. To tackle these challenges, North China Centre of Million Book Project (in Chinese Academy of Sciences) has initiated several innovative research projects in areas such as multimedia content analysis and retrieval, bilingual services, multimodal information presentation, and knowledge-based organization and services. In this keynote speech, we simply review our work in these areas, and argue that by technological cooperation with these innovation research topics, the project will develop a top-level digital library platform for the million book library.
文摘Achieving accurate segmentation of brain tumors in Magnetic Resonance Imaging(MRI)is important for clinical diagnosis and accurate treatment,and the efficient extraction and analysis of MRI multimodal feature information is the key to achieving accurate segmentation.In this paper,we propose a multimodal information fusion method for brain tumor segmentation,aimed at achieving full utilization of multimodal information for accurate segmentation in MRI.In our method,the semantic information processing module(SIPM)and Multimodal Feature Reasoning Module(MFRM)are included:(1)SIPM is introduced to achieve free multiscale feature enhancement and extraction;(2)MFRM is constructed to process both the backbone network feature information layer and semantic feature information layer.Using extensive experiments,the proposed method is validated.The experimental results based on BraTS2018 and BraTS2019 datasets show that the method has unique advantages over existing brain tumor segmentation methods.
文摘Information fusion is a key step in multimodal biometric systems. The feature-level fusion is more effective than the score-level and decision-level method owing to the fact that the original feature set contains richer information about the biometric data. In this paper, we present a multiset generalized canonical discriminant projection (MGCDP) method for feature-level multimodal biometric information fusion, which maximizes the correlation of the intra-class features while minimizes the correlation of the between-class. In addition, the serial MGCDP (S-MGCDP) and parallel MGCDP (P-MGCDP) strategy were also proposed, which can fuse more than two kinds of biometric information, so as to achieve better identification effect. Experiments performed on various biometric databases shows that MGCDP method outperforms other state-of-the-art feature-level information fusion approaches.
基金supported by the National Key R&D Program of China(Grant Nos.2020YFA0712403 and 2021YFF1200901)the National Natural Science Foundation of China(Grant Nos.61922047,81890993,61721003,and 62133006)+1 种基金the Beijing National Research Centre for Information Science and Technology Young Innovation Fund,China(Grant No.BNR2020RC01009)the Science and Technology Commission of Shanghai Municipality,China(Grant No.20PJ1408300)。
文摘Sequencing-based spatial transcriptomics(ST)is an emerging technology to study in situ gene expression patterns at the whole-genome scale.Currently,ST data analysis is still complicated by high technical noises and low resolution.In addition to the transcriptomic data,matched histopathological images are usually generated for the same tissue sample along the ST experiment.The matched high-resolution histopathological images provide complementary cellular phenotypical information,providing an opportunity to mitigate the noises in ST data.We present a novel ST data analysis method called transcriptome and histopathological image integrative analysis for ST(TIST),which enables the identification of spatial clusters(SCs)and the enhancement of spatial gene expression patterns by integrative analysis of matched transcriptomic data and images.TIST devises a histopathological feature extraction method based on Markov random field(MRF)to learn the cellular features from histopathological images,and integrates them with the transcriptomic data and location information as a network,termed TIST-net.Based on TIST-net,SCs are identified by a random walk-based strategy,and gene expression patterns are enhanced by neighborhood smoothing.We benchmark TIST on both simulated datasets and 32 real samples against several state-of-the-art methods.Results show that TIST is robust to technical noises on multiple analysis tasks for sequencing-based ST data and can find interesting microstructures in different biological scenarios.TIST is available at http://lifeome.net/software/tist/and https://ngdc.cncb.ac.cn/biocode/tools/BT007317.
基金supported by the Applied Basic Research Program Project of Liaoning Province(grant number 2023JH2/101300141)Shenyang Science and Technology Program(grant number 23-407-3-38)+3 种基金Joint Fund Project of the National Natural Science Foundation of China(grant number U20A20201)National Science Foundation of China(grant number 62103406)Applied Basic Research Program Project of Liaoning Province(grant number 2022JH2/101300102)Autonomous Project of State Key Laboratory of Robotics(grant number 2022-Z02,2022-Z19).
文摘Brain-inspired navigation technologies combine environmental perception,spatial cognition,and target navigation to create a comprehensive navigation research system.Researchers have used various sensors to gather environmental data and enhance environmental perception using multimodal information fusion.In spatial cognition,a neural network model is used to simulate the navigation mechanism of the animal brain and to construct an environmental cognition map.However,existing models face challenges in achieving high navigation success rate and efficiency.In addition,the limited incorporation of navigation mechanisms borrowed from animal brains necessitates further exploration.On the basis of the braininspired navigation process,this paper launched a systematic study on brain-inspired environment perception,brain-inspired spatial cognition,and goal-based navigation in brain-inspired navigation,which provides a new classification of brain-inspired cognition and navigation techniques and a theoretical basis for subsequent experimental studies.In the future,brain-inspired navigation technology should learn from more perfect brain-inspired mechanisms to improve its generalization ability and be simultaneously applied to large-scale distributed intelligent body cluster navigation.The multidisciplinary nature of braininspired navigation technology presents challenges,and multidisciplinary scholars must cooperate to promote the development of this technology.