期刊文献+
共找到25,679篇文章
< 1 2 250 >
每页显示 20 50 100
Cross-Modal Simplex Center Learning for Speech-Face Association
1
作者 Qiming Ma Fanliang Bu +3 位作者 Rong Wang Lingbin Bu Yifan Wang Zhiyuan Li 《Computers, Materials & Continua》 2025年第3期5169-5184,共16页
Speech-face association aims to achieve identity matching between facial images and voice segments by aligning cross-modal features.Existing research primarily focuses on learning shared-space representations and comp... Speech-face association aims to achieve identity matching between facial images and voice segments by aligning cross-modal features.Existing research primarily focuses on learning shared-space representations and computing one-to-one similarities between cross-modal sample pairs to establish their correlation.However,these approaches do not fully account for intra-class variations between the modalities or the many-to-many relationships among cross-modal samples,which are crucial for robust association modeling.To address these challenges,we propose a novel framework that leverages global information to align voice and face embeddings while effectively correlating identity information embedded in both modalities.First,we jointly pre-train face recognition and speaker recognition networks to encode discriminative features from facial images and voice segments.This shared pre-training step ensures the extraction of complementary identity information across modalities.Subsequently,we introduce a cross-modal simplex center loss,which aligns samples with identity centers located at the vertices of a regular simplex inscribed on a hypersphere.This design enforces an equidistant and balanced distribution of identity embeddings,reducing intra-class variations.Furthermore,we employ an improved triplet center loss that emphasizes hard sample mining and optimizes inter-class separability,enhancing the model’s ability to generalize across challenging scenarios.Extensive experiments validate the effectiveness of our framework,demonstrating superior performance across various speech-face association tasks,including matching,verification,and retrieval.Notably,in the challenging gender-constrained matching task,our method achieves a remarkable accuracy of 79.22%,significantly outperforming existing approaches.These results highlight the potential of the proposed framework to advance the state of the art in cross-modal identity association. 展开更多
关键词 Speech-face association cross-modal learning cross-modal matching cross-modal retrieval
在线阅读 下载PDF
Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions
2
作者 A-Seong Moon Seungyeon Jeong +3 位作者 Donghee Kim Mohd Asyraf Zulkifley Bong-Soo Sohn Jaesung Lee 《Computers, Materials & Continua》 2025年第11期2851-2872,共22页
Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed ... Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference,such as background noise,overlapping speech,and visual obstructions.The proposed framework employs a structured fusion approach,combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms.Audio data are transformed into mel-spectrogram representations,and visual data are represented as raw frame sequences.Spatial and temporal features are extracted through convolutional and transformer-based encoders,allowing the framework to capture complementary and hierarchical information fromboth sources.Across-modal attentionmodule enables selective emphasis on relevant signals while suppressing modality-specific noise.Performance is validated on a modified version of the AFEW dataset,in which controlled noise is introduced to emulate realistic conditions.The framework achieves higher classification accuracy than comparative baselines,confirming increased robustness under conditions of cross-modal disruption.This result demonstrates the suitability of the proposed method for deployment in practical emotion-aware technologies operating outside controlled environments.The study also contributes a systematic approach to fusion design and supports further exploration in the direction of resilientmultimodal emotion analysis frameworks.The source code is publicly available at https://github.com/asmoon002/AVER(accessed on 18 August 2025). 展开更多
关键词 Multimodal learning emotion recognition cross-modal attention robust representation learning
在线阅读 下载PDF
MSCM-Net:Rail Surface Defect Detection Based on a Multi-Scale Cross-Modal Network
3
作者 Xin Wen Xiao Zheng Yu He 《Computers, Materials & Continua》 2025年第3期4371-4388,共18页
Detecting surface defects on unused rails is crucial for evaluating rail quality and durability to ensure the safety of rail transportation.However,existing detection methods often struggle with challenges such as com... Detecting surface defects on unused rails is crucial for evaluating rail quality and durability to ensure the safety of rail transportation.However,existing detection methods often struggle with challenges such as complex defect morphology,texture similarity,and fuzzy edges,leading to poor accuracy and missed detections.In order to resolve these problems,we propose MSCM-Net(Multi-Scale Cross-Modal Network),a multiscale cross-modal framework focused on detecting rail surface defects.MSCM-Net introduces an attention mechanism to dynamically weight the fusion of RGB and depth maps,effectively capturing and enhancing features at different scales for each modality.To further enrich feature representation and improve edge detection in blurred areas,we propose a multi-scale void fusion module that integrates multi-scale feature information.To improve cross-modal feature fusion,we develop a cross-enhanced fusion module that transfers fused features between layers to incorporate interlayer information.We also introduce a multimodal feature integration module,which merges modality-specific features from separate decoders into a shared decoder,enhancing detection by leveraging richer complementary information.Finally,we validate MSCM-Net on the NEU RSDDS-AUG RGB-depth dataset,comparing it against 12 leading methods,and the results show that MSCM-Net achieves superior performance on all metrics. 展开更多
关键词 Surface defect detection multiscale framework cross-modal fusion edge detection
在线阅读 下载PDF
Fake News Detection Based on Cross-Modal Ambiguity Computation and Multi-Scale Feature Fusion
4
作者 Jianxiang Cao Jinyang Wu +5 位作者 Wenqian Shang Chunhua Wang Kang Song Tong Yi Jiajun Cai Haibin Zhu 《Computers, Materials & Continua》 2025年第5期2659-2675,共17页
With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of... With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection. 展开更多
关键词 Fake news detection MULTIMODAL cross-modal ambiguity computation multi-scale feature fusion
在线阅读 下载PDF
PhishNet: A Real-Time, Scalable Ensemble Framework for Smishing Attack Detection Using Transformers and LLMs
5
作者 Abeer Alhuzali Qamar Al-Qahtani +2 位作者 Asmaa Niyazi Lama Alshehri Fatemah Alharbi 《Computers, Materials & Continua》 2026年第1期2194-2212,共19页
The surge in smishing attacks underscores the urgent need for robust,real-time detection systems powered by advanced deep learning models.This paper introduces PhishNet,a novel ensemble learning framework that integra... The surge in smishing attacks underscores the urgent need for robust,real-time detection systems powered by advanced deep learning models.This paper introduces PhishNet,a novel ensemble learning framework that integrates transformer-based models(RoBERTa)and large language models(LLMs)(GPT-OSS 120B,LLaMA3.370B,and Qwen332B)to enhance smishing detection performance significantly.To mitigate class imbalance,we apply synthetic data augmentation using T5 and leverage various text preprocessing techniques.Our system employs a duallayer voting mechanism:weighted majority voting among LLMs and a final ensemble vote to classify messages as ham,spam,or smishing.Experimental results show an average accuracy improvement from 96%to 98.5%compared to the best standalone transformer,and from 93%to 98.5%when compared to LLMs across datasets.Furthermore,we present a real-time,user-friendly application to operationalize our detection model for practical use.PhishNet demonstrates superior scalability,usability,and detection accuracy,filling critical gaps in current smishing detection methodologies. 展开更多
关键词 Smishing attack detection phishing attacks ensemble learning CYBERSECURITY deep learning transformer-based models large language models
在线阅读 下载PDF
Unveiling Zero-Click Attacks: Mapping MITRE ATT&CK Framework for Enhanced Cybersecurity
6
作者 Md Shohel Rana Tonmoy Ghosh +2 位作者 Mohammad Nur Nobi Anichur Rahman Andrew HSung 《Computers, Materials & Continua》 2026年第1期29-66,共38页
Zero-click attacks represent an advanced cybersecurity threat,capable of compromising devices without user interaction.High-profile examples such as Pegasus,Simjacker,Bluebugging,and Bluesnarfing exploit hidden vulner... Zero-click attacks represent an advanced cybersecurity threat,capable of compromising devices without user interaction.High-profile examples such as Pegasus,Simjacker,Bluebugging,and Bluesnarfing exploit hidden vulnerabilities in software and communication protocols to silently gain access,exfiltrate data,and enable long-term surveillance.Their stealth and ability to evade traditional defenses make detection and mitigation highly challenging.This paper addresses these threats by systematically mapping the tactics and techniques of zero-click attacks using the MITRE ATT&CK framework,a widely adopted standard for modeling adversarial behavior.Through this mapping,we categorize real-world attack vectors and better understand how such attacks operate across the cyber-kill chain.To support threat detection efforts,we propose an Active Learning-based method to efficiently label the Pegasus spyware dataset in alignment with the MITRE ATT&CK framework.This approach reduces the effort of manually annotating data while improving the quality of the labeled data,which is essential to train robust cybersecurity models.In addition,our analysis highlights the structured execution paths of zero-click attacks and reveals gaps in current defense strategies.The findings emphasize the importance of forward-looking strategies such as continuous surveillance,dynamic threat profiling,and security education.By bridging zero-click attack analysis with the MITRE ATT&CK framework and leveraging machine learning for dataset annotation,this work provides a foundation for more accurate threat detection and the development of more resilient and structured cybersecurity frameworks. 展开更多
关键词 Bluebugging bluesnarfing CYBERSECURITY MITRE ATT&CK PEGASUS simjacker zero-click attacks
在线阅读 下载PDF
A Novel Unsupervised Structural Attack and Defense for Graph Classification
7
作者 Yadong Wang Zhiwei Zhang +2 位作者 Pengpeng Qiao Ye Yuan Guoren Wang 《Computers, Materials & Continua》 2026年第1期1761-1782,共22页
Graph Neural Networks(GNNs)have proven highly effective for graph classification across diverse fields such as social networks,bioinformatics,and finance,due to their capability to learn complex graph structures.Howev... Graph Neural Networks(GNNs)have proven highly effective for graph classification across diverse fields such as social networks,bioinformatics,and finance,due to their capability to learn complex graph structures.However,despite their success,GNNs remain vulnerable to adversarial attacks that can significantly degrade their classification accuracy.Existing adversarial attack strategies primarily rely on label information to guide the attacks,which limits their applicability in scenarios where such information is scarce or unavailable.This paper introduces an innovative unsupervised attack method for graph classification,which operates without relying on label information,thereby enhancing its applicability in a broad range of scenarios.Specifically,our method first leverages a graph contrastive learning loss to learn high-quality graph embeddings by comparing different stochastic augmented views of the graphs.To effectively perturb the graphs,we then introduce an implicit estimator that measures the impact of various modifications on graph structures.The proposed strategy identifies and flips edges with the top-K highest scores,determined by the estimator,to maximize the degradation of the model’s performance.In addition,to defend against such attack,we propose a lightweight regularization-based defense mechanism that is specifically tailored to mitigate the structural perturbations introduced by our attack strategy.It enhances model robustness by enforcing embedding consistency and edge-level smoothness during training.We conduct experiments on six public TU graph classification datasets:NCI1,NCI109,Mutagenicity,ENZYMES,COLLAB,and DBLP_v1,to evaluate the effectiveness of our attack and defense strategies.Under an attack budget of 3,the maximum reduction in model accuracy reaches 6.67%on the Graph Convolutional Network(GCN)and 11.67%on the Graph Attention Network(GAT)across different datasets,indicating that our unsupervised method induces degradation comparable to state-of-the-art supervised attacks.Meanwhile,our defense achieves the highest accuracy recovery of 3.89%(GCN)and 5.00%(GAT),demonstrating improved robustness against structural perturbations. 展开更多
关键词 Graph classification graph neural networks adversarial attack
在线阅读 下载PDF
Gradient-Guided Assembly Instruction Relocation for Adversarial Attacks Against Binary Code Similarity Detection
8
作者 Ran Wei Hui Shu 《Computers, Materials & Continua》 2026年第1期1372-1394,共23页
Transformer-based models have significantly advanced binary code similarity detection(BCSD)by leveraging their semantic encoding capabilities for efficient function matching across diverse compilation settings.Althoug... Transformer-based models have significantly advanced binary code similarity detection(BCSD)by leveraging their semantic encoding capabilities for efficient function matching across diverse compilation settings.Although adversarial examples can strategically undermine the accuracy of BCSD models and protect critical code,existing techniques predominantly depend on inserting artificial instructions,which incur high computational costs and offer limited diversity of perturbations.To address these limitations,we propose AIMA,a novel gradient-guided assembly instruction relocation method.Our method decouples the detection model into tokenization,embedding,and encoding layers to enable efficient gradient computation.Since token IDs of instructions are discrete and nondifferentiable,we compute gradients in the continuous embedding space to evaluate the influence of each token.The most critical tokens are identified by calculating the L2 norm of their embedding gradients.We then establish a mapping between instructions and their corresponding tokens to aggregate token-level importance into instructionlevel significance.To maximize adversarial impact,a sliding window algorithm selects the most influential contiguous segments for relocation,ensuring optimal perturbation with minimal length.This approach efficiently locates critical code regions without expensive search operations.The selected segments are relocated outside their original function boundaries via a jump mechanism,which preserves runtime control flow and functionality while introducing“deletion”effects in the static instruction sequence.Extensive experiments show that AIMA reduces similarity scores by up to 35.8%in state-of-the-art BCSD models.When incorporated into training data,it also enhances model robustness,achieving a 5.9%improvement in AUROC. 展开更多
关键词 Assembly instruction relocation adversary attack binary code similarity detection
在线阅读 下载PDF
Impact of Data Processing Techniques on AI Models for Attack-Based Imbalanced and Encrypted Traffic within IoT Environments
9
作者 Yeasul Kim Chaeeun Won Hwankuk Kim 《Computers, Materials & Continua》 2026年第1期247-274,共28页
With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comp... With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy. 展开更多
关键词 Encrypted traffic attack detection data sampling technique AI-based detection IoT environment
在线阅读 下载PDF
Towards Decentralized IoT Security: Optimized Detection of Zero-Day Multi-Class Cyber-Attacks Using Deep Federated Learning
10
作者 Misbah Anwer Ghufran Ahmed +3 位作者 Maha Abdelhaq Raed Alsaqour Shahid Hussain Adnan Akhunzada 《Computers, Materials & Continua》 2026年第1期744-758,共15页
The exponential growth of the Internet of Things(IoT)has introduced significant security challenges,with zero-day attacks emerging as one of the most critical and challenging threats.Traditional Machine Learning(ML)an... The exponential growth of the Internet of Things(IoT)has introduced significant security challenges,with zero-day attacks emerging as one of the most critical and challenging threats.Traditional Machine Learning(ML)and Deep Learning(DL)techniques have demonstrated promising early detection capabilities.However,their effectiveness is limited when handling the vast volumes of IoT-generated data due to scalability constraints,high computational costs,and the costly time-intensive process of data labeling.To address these challenges,this study proposes a Federated Learning(FL)framework that leverages collaborative and hybrid supervised learning to enhance cyber threat detection in IoT networks.By employing Deep Neural Networks(DNNs)and decentralized model training,the approach reduces computational complexity while improving detection accuracy.The proposed model demonstrates robust performance,achieving accuracies of 94.34%,99.95%,and 87.94%on the publicly available kitsune,Bot-IoT,and UNSW-NB15 datasets,respectively.Furthermore,its ability to detect zero-day attacks is validated through evaluations on two additional benchmark datasets,TON-IoT and IoT-23,using a Deep Federated Learning(DFL)framework,underscoring the generalization and effectiveness of the model in heterogeneous and decentralized IoT environments.Experimental results demonstrate superior performance over existing methods,establishing the proposed framework as an efficient and scalable solution for IoT security. 展开更多
关键词 Cyber-attack intrusion detection system(IDS) deep federated learning(DFL) zero-day attack distributed denial of services(DDoS) MULTI-CLASS Internet of Things(IoT)
在线阅读 下载PDF
Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism 被引量:1
11
作者 Lujuan Deng Boyi Liu Zuhe Li 《Computers, Materials & Continua》 SCIE EI 2024年第1期1157-1170,共14页
Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fu... Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fusion method does not utilize the correlation information between modalities.To solve this problem,this paper proposes amodel based on amulti-head attention mechanism.First,after preprocessing the original data.Then,the feature representation is converted into a sequence of word vectors and positional encoding is introduced to better understand the semantic and sequential information in the input sequence.Next,the input coding sequence is fed into the transformer model for further processing and learning.At the transformer layer,a cross-modal attention consisting of a pair of multi-head attention modules is employed to reflect the correlation between modalities.Finally,the processed results are input into the feedforward neural network to obtain the emotional output through the classification layer.Through the above processing flow,the model can capture semantic information and contextual relationships and achieve good results in various natural language processing tasks.Our model was tested on the CMU Multimodal Opinion Sentiment and Emotion Intensity(CMU-MOSEI)and Multimodal EmotionLines Dataset(MELD),achieving an accuracy of 82.04% and F1 parameters reached 80.59% on the former dataset. 展开更多
关键词 Emotion analysis deep learning cross-modal attention mechanism
在线阅读 下载PDF
Mechanism of Cross-modal Information Influencing Taste 被引量:1
12
作者 Pei LIANG Jia-yu JIANG +2 位作者 Qiang LIU Su-lin ZHANG Hua-jing YANG 《Current Medical Science》 SCIE CAS 2020年第3期474-479,共6页
Studies on the integration of cross-modal information with taste perception has been mostly limited to uni-modal level.The cross-modal sensory interaction and the neural network of information processing and its contr... Studies on the integration of cross-modal information with taste perception has been mostly limited to uni-modal level.The cross-modal sensory interaction and the neural network of information processing and its control were not fully explored and the mechanisms remain poorly understood.This mini review investigated the impact of uni-modal and multi-modal information on the taste perception,from the perspective of cognitive status,such as emotion,expectation and attention,and discussed the hypothesis that the cognitive status is the key step for visual sense to exert influence on taste.This work may help researchers better understand the mechanism of cross-modal information processing and further develop neutrally-based artificial intelligent(AI)system. 展开更多
关键词 cross-modal information integration cognitive status taste perception
暂未订购
ViT2CMH:Vision Transformer Cross-Modal Hashing for Fine-Grained Vision-Text Retrieval 被引量:1
13
作者 Mingyong Li Qiqi Li +1 位作者 Zheng Jiang Yan Ma 《Computer Systems Science & Engineering》 SCIE EI 2023年第8期1401-1414,共14页
In recent years,the development of deep learning has further improved hash retrieval technology.Most of the existing hashing methods currently use Convolutional Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)... In recent years,the development of deep learning has further improved hash retrieval technology.Most of the existing hashing methods currently use Convolutional Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)to process image and text information,respectively.This makes images or texts subject to local constraints,and inherent label matching cannot capture finegrained information,often leading to suboptimal results.Driven by the development of the transformer model,we propose a framework called ViT2CMH mainly based on the Vision Transformer to handle deep Cross-modal Hashing tasks rather than CNNs or RNNs.Specifically,we use a BERT network to extract text features and use the vision transformer as the image network of the model.Finally,the features are transformed into hash codes for efficient and fast retrieval.We conduct extensive experiments on Microsoft COCO(MS-COCO)and Flickr30K,comparing with baselines of some hashing methods and image-text matching methods,showing that our method has better performance. 展开更多
关键词 Hash learning cross-modal retrieval fine-grained matching TRANSFORMER
在线阅读 下载PDF
A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition 被引量:1
14
作者 Peizhu Gong Jin Liu +3 位作者 Zhongdai Wu Bing Han YKenWang Huihua He 《Computers, Materials & Continua》 SCIE EI 2023年第2期4203-4220,共18页
Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due... Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due to its inclusion of the semantic features of two different modalities,i.e.,audio and text.However,existing methods often fail in effectively represent features and capture correlations.This paper presents a multi-level circulant cross-modal Transformer(MLCCT)formultimodal speech emotion recognition.The proposed model can be divided into three steps,feature extraction,interaction and fusion.Self-supervised embedding models are introduced for feature extraction,which give a more powerful representation of the original data than those using spectrograms or audio features such as Mel-frequency cepstral coefficients(MFCCs)and low-level descriptors(LLDs).In particular,MLCCT contains two types of feature interaction processes,where a bidirectional Long Short-term Memory(Bi-LSTM)with circulant interaction mechanism is proposed for low-level features,while a two-stream residual cross-modal Transformer block is appliedwhen high-level features are involved.Finally,we choose self-attention blocks for fusion and a fully connected layer to make predictions.To evaluate the performance of our proposed model,comprehensive experiments are conducted on three widely used benchmark datasets including IEMOCAP,MELD and CMU-MOSEI.The competitive results verify the effectiveness of our approach. 展开更多
关键词 Speech emotion recognition self-supervised embedding model cross-modal transformer self-attention
在线阅读 下载PDF
Cross-Modal Consistency with Aesthetic Similarity for Multimodal False Information Detection 被引量:1
15
作者 Weijian Fan Ziwei Shi 《Computers, Materials & Continua》 SCIE EI 2024年第5期2723-2741,共19页
With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to mult... With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to multimodalinformation exchange and fusion, with many methods attempting to integrate unimodal features to generatemultimodal news representations. However, they still need to fully explore the hierarchical and complex semanticcorrelations between different modal contents, severely limiting their performance detecting multimodal falseinformation. This work proposes a two-stage detection framework for multimodal false information detection,called ASMFD, which is based on image aesthetic similarity to segment and explores the consistency andinconsistency features of images and texts. Specifically, we first use the Contrastive Language-Image Pre-training(CLIP) model to learn the relationship between text and images through label awareness and train an imageaesthetic attribute scorer using an aesthetic attribute dataset. Then, we calculate the aesthetic similarity betweenthe image and related images and use this similarity as a threshold to divide the multimodal correlation matrixinto consistency and inconsistencymatrices. Finally, the fusionmodule is designed to identify essential features fordetectingmultimodal false information. In extensive experiments on four datasets, the performance of the ASMFDis superior to state-of-the-art baseline methods. 展开更多
关键词 Social media false information detection image aesthetic assessment cross-modal consistency
在线阅读 下载PDF
Use of sensory substitution devices as a model system for investigating cross-modal neuroplasticity in humans 被引量:1
16
作者 Amy C.Nau Matthew C.Murphy Kevin C.Chan 《Neural Regeneration Research》 SCIE CAS CSCD 2015年第11期1717-1719,共3页
Blindness provides an unparalleled opportunity to study plasticity of the nervous system in humans.Seminal work in this area examined the often dramatic modifications to the visual cortex that result when visual input... Blindness provides an unparalleled opportunity to study plasticity of the nervous system in humans.Seminal work in this area examined the often dramatic modifications to the visual cortex that result when visual input is completely absent from birth or very early in life(Kupers and Ptito,2014).More recent studies explored what happens to the visual pathways in the context of acquired blindness.This is particularly relevant as the majority of diseases that cause vision loss occur in the elderly. 展开更多
关键词 Use of sensory substitution devices as a model system for investigating cross-modal neuroplasticity in humans BOLD
暂未订购
CSMCCVA:Framework of cross-modal semantic mapping based on cognitive computing of visual and auditory sensations 被引量:1
17
作者 刘扬 Zheng Fengbin Zuo Xianyu 《High Technology Letters》 EI CAS 2016年第1期90-98,共9页
Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of co... Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of cognitive system,and establishes a brain-like cross-modal semantic mapping framework based on cognitive computing of visual and auditory sensations.The mechanism of visual-auditory multisensory integration,selective attention in thalamo-cortical,emotional control in limbic system and the memory-enhancing in hippocampal were considered in the framework.Then,the algorithms of cross-modal semantic mapping were given.Experimental results show that the framework can be effectively applied to the cross-modal semantic mapping,and also provides an important significance for brain-like computing of non-von Neumann structure. 展开更多
关键词 multimedia neural cognitive computing (MNCC) brain-like computing cross-modal semantic mapping (CSM) selective attention limbic system multisensory integration memory-enhancing mechanism
在线阅读 下载PDF
Keypoints and Descriptors Based on Cross-Modality Information Fusion for Camera Localization
18
作者 MA Shuo GAO Yongbin+ +4 位作者 TIAN Fangzheng LU Junxin HUANG Bo GU Jia ZHOU Yilong 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2021年第2期128-136,共9页
To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, key... To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method. 展开更多
关键词 keypoints DESCRIPTORS cross-modality information global feature visual odometry
原文传递
Cross-Modal Hashing Retrieval Based on Deep Residual Network
19
作者 Zhiyi Li Xiaomian Xu +1 位作者 Du Zhang Peng Zhang 《Computer Systems Science & Engineering》 SCIE EI 2021年第2期383-405,共23页
In the era of big data rich inWe Media,the single mode retrieval system has been unable to meet people’s demand for information retrieval.This paper proposes a new solution to the problem of feature extraction and un... In the era of big data rich inWe Media,the single mode retrieval system has been unable to meet people’s demand for information retrieval.This paper proposes a new solution to the problem of feature extraction and unified mapping of different modes:A Cross-Modal Hashing retrieval algorithm based on Deep Residual Network(CMHR-DRN).The model construction is divided into two stages:The first stage is the feature extraction of different modal data,including the use of Deep Residual Network(DRN)to extract the image features,using the method of combining TF-IDF with the full connection network to extract the text features,and the obtained image and text features used as the input of the second stage.In the second stage,the image and text features are mapped into Hash functions by supervised learning,and the image and text features are mapped to the common binary Hamming space.In the process of mapping,the distance measurement of the original distance measurement and the common feature space are kept unchanged as far as possible to improve the accuracy of Cross-Modal Retrieval.In training the model,adaptive moment estimation(Adam)is used to calculate the adaptive learning rate of each parameter,and the stochastic gradient descent(SGD)is calculated to obtain the minimum loss function.The whole training process is completed on Caffe deep learning framework.Experiments show that the proposed algorithm CMHR-DRN based on Deep Residual Network has better retrieval performance and stronger advantages than other Cross-Modal algorithms CMFH,CMDN and CMSSH. 展开更多
关键词 Deep residual network cross-modal retrieval HASHING cross-modal hashing retrieval based on deep residual network
在线阅读 下载PDF
TECMH:Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
20
作者 Qiqi Li Longfei Ma +2 位作者 Zheng Jiang Mingyong Li Bo Jin 《Computers, Materials & Continua》 SCIE EI 2023年第5期3713-3728,共16页
In recent years,cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage.Cross-modal retrieval technology can be applied to search engines,crossmodalm... In recent years,cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage.Cross-modal retrieval technology can be applied to search engines,crossmodalmedical processing,etc.The existing main method is to use amulti-label matching paradigm to finish the retrieval tasks.However,such methods do not use fine-grained information in the multi-modal data,which may lead to suboptimal results.To avoid cross-modal matching turning into label matching,this paper proposes an end-to-end fine-grained cross-modal hash retrieval method,which can focus more on the fine-grained semantic information of multi-modal data.First,the method refines the image features and no longer uses multiple labels to represent text features but uses BERT for processing.Second,this method uses the inference capabilities of the transformer encoder to generate global fine-grained features.Finally,in order to better judge the effect of the fine-grained model,this paper uses the datasets in the image text matching field instead of the traditional label-matching datasets.This article experiment on Microsoft COCO(MS-COCO)and Flickr30K datasets and compare it with the previous classicalmethods.The experimental results show that this method can obtain more advanced results in the cross-modal hash retrieval field. 展开更多
关键词 Deep learning cross-modal retrieval hash learning TRANSFORMER
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部