期刊文献+
共找到120篇文章
< 1 2 6 >
每页显示 20 50 100
Automatic clustering of single-molecule break junction data through task-oriented representation learning
1
作者 Yi-Heng Zhao Shen-Wen Pang +4 位作者 Heng-Zhi Huang Shao-Wen Wu Shao-Hua Sun Zhen-Bing Liu Zhi-Chao Pan 《Rare Metals》 2025年第5期3244-3257,共14页
Clustering is a pivotal data analysis method for deciphering the charge transport properties of single molecules in break junction experiments.However,given the high dimensionality and variability of the data,feature ... Clustering is a pivotal data analysis method for deciphering the charge transport properties of single molecules in break junction experiments.However,given the high dimensionality and variability of the data,feature extraction remains a bottleneck in the development of efficient clustering methods.In this regard,extensive research over the past two decades has focused on feature engineering and dimensionality reduction in break junction conductance.However,extracting highly relevant features without expert knowledge remains an unresolved challenge.To address this issue,we propose a deep clustering method driven by task-oriented representation learning(CTRL)in which the clustering module serves as a guide for the representation learning(RepL)module.First,we determine an optimal autoencoder(AE)structure through a neural architecture search(NAS)to ensure efficient RepL;second,the RepL process is guided by a joint training strategy that combines AE reconstruction loss with the clustering objective.The results demonstrate that CTRL achieves excellent performance on both the generated and experimental data.Further inspection of the RepL step reveals that joint training robustly learns more compact features than the unconstrained AE or traditional dimensionality reduction methods,significantly reducing misclustering possibilities.Our method provides a general end-to-end automatic clustering solution for analyzing single-molecule break junction data. 展开更多
关键词 Single-molecule conductance Break junction Deep clustering representation learning Neural architecture search
原文传递
Multi-scale information fusion and decoupled representation learning for robust microbe-disease interaction prediction
2
作者 Wentao Wang Qiaoying Yan +5 位作者 Qingquan Liao Xinyuan Jin Yinyin Gong Linlin Zhuo Xiangzheng Fu Dongsheng Cao 《Journal of Pharmaceutical Analysis》 2025年第8期1738-1752,共15页
Research indicates that microbe activity within the human body significantly influences health by being closely linked to various diseases.Accurately predicting microbe-disease interactions(MDIs)offers critical insigh... Research indicates that microbe activity within the human body significantly influences health by being closely linked to various diseases.Accurately predicting microbe-disease interactions(MDIs)offers critical insights for disease intervention and pharmaceutical research.Current advanced AI-based technologies automatically generate robust representations of microbes and diseases,enabling effective MDI predictions.However,these models continue to face significant challenges.A major issue is their reliance on complex feature extractors and classifiers,which substantially diminishes the models’generalizability.To address this,we introduce a novel graph autoencoder framework that utilizes decoupled representation learning and multi-scale information fusion strategies to efficiently infer potential MDIs.Initially,we randomly mask portions of the input microbe-disease graph based on Bernoulli distribution to boost self-supervised training and minimize noise-related performance degradation.Secondly,we employ decoupled representation learning technology,compelling the graph neural network(GNN)to independently learn the weights for each feature subspace,thus enhancing its expressive power.Finally,we implement multi-scale information fusion technology to amalgamate the multi-layer outputs of GNN,reducing information loss due to occlusion.Extensive experiments on public datasets demonstrate that our model significantly surpasses existing top MDI prediction models.This indicates that our model can accurately predict unknown MDIs and is likely to aid in disease discovery and precision pharmaceutical research.Code and data are accessible at:https://github.com/shmildsj/MDI-IFDRL. 展开更多
关键词 Microbe-disease interactions(MDIs) Pharmaceutical research AI-Based technologies Decoupled representation learning Multi-scale information fusion
在线阅读 下载PDF
LatentPINNs:Generative physics-informed neural networks via a latent representation learning
3
作者 Mohammad H.Taufik Tariq Alkhalifah 《Artificial Intelligence in Geosciences》 2025年第1期155-165,共11页
Physics-informed neural networks(PINNs)are promising to replace conventional mesh-based partial tial differen-equation(PDE)solvers by offering more accurate and flexible PDE solutions.However,PINNs are hampered by the... Physics-informed neural networks(PINNs)are promising to replace conventional mesh-based partial tial differen-equation(PDE)solvers by offering more accurate and flexible PDE solutions.However,PINNs are hampered by the relatively slow convergence and the need to perform additional,potentially expensive training for new PDE parameters.To solve this limitation,we introduce LatentPINN,a framework that utilizes latent representations of the PDE parameters as additional(to the coordinates)inputs into PINNs and allows for training over the distribution of these parameters.Motivated by the recent progress on generative models,we promote using latent diffusion models to learn compressed latent representations of the distribution of PDE parameters as they act as input parameters for NN functional solutions.We use a two-stage training scheme in which,in the first stage,we learn the latent representations for the distribution of PDE parameters.In the second stage,we train a physics-informed neural network over inputs given by randomly drawn samples from the coordinate space within the solution domain and samples from the learned latent representation of the PDE parameters.Considering their importance in capturing evolving interfaces and fronts in various fields,we test the approach on a class of level set equations given,for example,by the nonlinear Eikonal equation.We share results corresponding to three Eikonal parameters(velocity models)sets.The proposed method performs well on new phase velocity models without the need for any additional training. 展开更多
关键词 Physics-informed neural networks PDE solvers Latent representation learning
在线阅读 下载PDF
Tri-party deep network representation learning using inductive matrix completion 被引量:4
4
作者 YE Zhong-lin ZHAO Hai-xing +2 位作者 ZHANG Ke ZHU Yu XIAO Yu-zhi 《Journal of Central South University》 SCIE EI CAS CSCD 2019年第10期2746-2758,共13页
Most existing network representation learning algorithms focus on network structures for learning.However,network structure is only one kind of view and feature for various networks,and it cannot fully reflect all cha... Most existing network representation learning algorithms focus on network structures for learning.However,network structure is only one kind of view and feature for various networks,and it cannot fully reflect all characteristics of networks.In fact,network vertices usually contain rich text information,which can be well utilized to learn text-enhanced network representations.Meanwhile,Matrix-Forest Index(MFI)has shown its high effectiveness and stability in link prediction tasks compared with other algorithms of link prediction.Both MFI and Inductive Matrix Completion(IMC)are not well applied with algorithmic frameworks of typical representation learning methods.Therefore,we proposed a novel semi-supervised algorithm,tri-party deep network representation learning using inductive matrix completion(TDNR).Based on inductive matrix completion algorithm,TDNR incorporates text features,the link certainty degrees of existing edges and the future link probabilities of non-existing edges into network representations.The experimental results demonstrated that TFNR outperforms other baselines on three real-world datasets.The visualizations of TDNR show that proposed algorithm is more discriminative than other unsupervised approaches. 展开更多
关键词 network representation network embedding representation learning matrix-forestindex inductive matrix completion
在线阅读 下载PDF
Contrastive Self-supervised Representation Learning Using Synthetic Data 被引量:4
5
作者 Dong-Yu She Kun Xu 《International Journal of Automation and computing》 EI CSCD 2021年第4期556-567,共12页
Learning discriminative representations with deep neural networks often relies on massive labeled data, which is expensive and difficult to obtain in many real scenarios. As an alternative, self-supervised learning th... Learning discriminative representations with deep neural networks often relies on massive labeled data, which is expensive and difficult to obtain in many real scenarios. As an alternative, self-supervised learning that leverages input itself as supervision is strongly preferred for its soaring performance on visual representation learning. This paper introduces a contrastive self-supervised framework for learning generalizable representations on the synthetic data that can be obtained easily with complete controllability.Specifically, we propose to optimize a contrastive learning task and a physical property prediction task simultaneously. Given the synthetic scene, the first task aims to maximize agreement between a pair of synthetic images generated by our proposed view sampling module, while the second task aims to predict three physical property maps, i.e., depth, instance contour maps, and surface normal maps. In addition, a feature-level domain adaptation technique with adversarial training is applied to reduce the domain difference between the realistic and the synthetic data. Experiments demonstrate that our proposed method achieves state-of-the-art performance on several visual recognition datasets. 展开更多
关键词 Self-supervised learning contrastive learning synthetic image convolutional neural network representation learning
原文传递
Enhanced Deep Autoencoder Based Feature Representation Learning for Intelligent Intrusion Detection System 被引量:3
6
作者 Thavavel Vaiyapuri Adel Binbusayyis 《Computers, Materials & Continua》 SCIE EI 2021年第9期3271-3288,共18页
In the era of Big data,learning discriminant feature representation from network traffic is identified has as an invariably essential task for improving the detection ability of an intrusion detection system(IDS).Owin... In the era of Big data,learning discriminant feature representation from network traffic is identified has as an invariably essential task for improving the detection ability of an intrusion detection system(IDS).Owing to the lack of accurately labeled network traffic data,many unsupervised feature representation learning models have been proposed with state-of-theart performance.Yet,these models fail to consider the classification error while learning the feature representation.Intuitively,the learnt feature representation may degrade the performance of the classification task.For the first time in the field of intrusion detection,this paper proposes an unsupervised IDS model leveraging the benefits of deep autoencoder(DAE)for learning the robust feature representation and one-class support vector machine(OCSVM)for finding the more compact decision hyperplane for intrusion detection.Specially,the proposed model defines a new unified objective function to minimize the reconstruction and classification error simultaneously.This unique contribution not only enables the model to support joint learning for feature representation and classifier training but also guides to learn the robust feature representation which can improve the discrimination ability of the classifier for intrusion detection.Three set of evaluation experiments are conducted to demonstrate the potential of the proposed model.First,the ablation evaluation on benchmark dataset,NSL-KDD validates the design decision of the proposed model.Next,the performance evaluation on recent intrusion dataset,UNSW-NB15 signifies the stable performance of the proposed model.Finally,the comparative evaluation verifies the efficacy of the proposed model against recently published state-of-the-art methods. 展开更多
关键词 CYBERSECURITY network intrusion detection deep learning autoencoder stacked autoencoder feature representational learning joint learning one-class classifier OCSVM
在线阅读 下载PDF
Early Detection of Diabetic Retinopathy Using Machine Intelligence throughDeep Transfer and Representational Learning 被引量:2
7
作者 Fouzia Nawaz Muhammad Ramzan +3 位作者 Khalid Mehmood Hikmat Ullah Khan Saleem Hayat Khan Muhammad Raheel Bhutta 《Computers, Materials & Continua》 SCIE EI 2021年第2期1631-1645,共15页
Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appea... Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appear at the initial level. To preventblindness, early detection and regular treatment are needed. Automated detectionbased on machine intelligence may assist the ophthalmologist in examining thepatients’ condition more accurately and efficiently. The purpose of this study is toproduce an automated screening system for recognition and grading of diabetic retinopathyusing machine learning through deep transfer and representational learning.The artificial intelligence technique used is transfer learning on the deep neural network,Inception-v4. Two configuration variants of transfer learning are applied onInception-v4: Fine-tune mode and fixed feature extractor mode. Both configurationmodes have achieved decent accuracy values, but the fine-tuning method outperformsthe fixed feature extractor configuration mode. Fine-tune configuration modehas gained 96.6% accuracy in early detection of DR and 97.7% accuracy in gradingthe disease and has outperformed the state of the art methods in the relevant literature. 展开更多
关键词 Diabetic retinopathy artificial intelligence automated screening system machine learning deep neural network transfer and representational learning
在线阅读 下载PDF
Homogeneity Analysis of Multiairport System Based on Airport Attributed Network Representation Learning 被引量:2
8
作者 LIU Caihua CAI Rui +1 位作者 FENG Xia XU Tao 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2021年第4期616-624,共9页
The homogeneity analysis of multi-airport system can provide important decision-making support for the route layout and cooperative operation.Existing research seldom analyzes the homogeneity of multi-airport system f... The homogeneity analysis of multi-airport system can provide important decision-making support for the route layout and cooperative operation.Existing research seldom analyzes the homogeneity of multi-airport system from the perspective of route network analysis,and the attribute information of airport nodes in the airport route network is not appropriately integrated into the airport network.In order to solve this problem,a multi-airport system homogeneity analysis method based on airport attribute network representation learning is proposed.Firstly,the route network of a multi-airport system with attribute information is constructed.If there are flights between airports,an edge is added between airports,and regional attribute information is added for each airport node.Secondly,the airport attributes and the airport network vector are represented respectively.The airport attributes and the airport network vector are embedded into the unified airport representation vector space by the network representation learning method,and then the airport vector integrating the airport attributes and the airport network characteristics is obtained.By calculating the similarity of the airport vectors,it is convenient to calculate the degree of homogeneity between airports and the homogeneity of the multi-airport system.The experimental results on the Beijing-Tianjin-Hebei multi-airport system show that,compared with other existing algorithms,the homogeneity analysis method based on attributed network representation learning can get more consistent results with the current situation of Beijing-Tianjin-Hebei multi-airport system. 展开更多
关键词 air transportation multi-airport system homogeneity analysis network representation learning airport attribute network
在线阅读 下载PDF
Chinese word segmentation with local and global context representation learning 被引量:2
9
作者 李岩 Zhang Yinghua +2 位作者 Huang Xiaoping Yin Xucheng Hao Hongwei 《High Technology Letters》 EI CAS 2015年第1期71-77,共7页
A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chin... A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chinese character learning model uses the semanties of loeal context and global context to learn the representation of Chinese characters. Then, Chinese word segmentation model is built by a neural network, while the segmentation model is trained with the eharaeter representations as its input features. Finally, experimental results show that Chinese charaeter representations can effectively learn the semantic information. Characters with similar semantics cluster together in the visualize space. Moreover, the proposed Chinese word segmentation model also achieves a pretty good improvement on precision, recall and f-measure. 展开更多
关键词 local and global context representation learning Chinese character representa- tion Chinese word segmentation
在线阅读 下载PDF
A malware propagation prediction model based on representation learning and graph convolutional networks
10
作者 Tun Li Yanbing Liu +3 位作者 Qilie Liu Wei Xu Yunpeng Xiao Hong Liu 《Digital Communications and Networks》 SCIE CSCD 2023年第5期1090-1100,共11页
The traditional malware research is mainly based on its recognition and detection as a breakthrough point,without focusing on its propagation trends or predicting the subsequently infected nodes.The complexity of netw... The traditional malware research is mainly based on its recognition and detection as a breakthrough point,without focusing on its propagation trends or predicting the subsequently infected nodes.The complexity of network structure,diversity of network nodes,and sparsity of data all pose difficulties in predicting propagation.This paper proposes a malware propagation prediction model based on representation learning and Graph Convolutional Networks(GCN)to address the aforementioned problems.First,to solve the problem of the inaccuracy of infection intensity calculation caused by the sparsity of node interaction behavior data in the malware propagation network,a mechanism based on a tensor to mine the infection intensity among nodes is proposed to retain the network structure information.The influence of the relationship between nodes on the infection intensity is also analyzed.Second,given the diversity and complexity of the content and structure of infected and normal nodes in the network,considering the advantages of representation learning in data feature extraction,the corresponding representation learning method is adopted for the characteristics of infection intensity among nodes.This can efficiently calculate the relationship between entities and relationships in low dimensional space to achieve the goal of low dimensional,dense,and real-valued representation learning for the characteristics of propagation spatial data.We also design a new method,Tensor2vec,to learn the potential structural features of malware propagation.Finally,considering the convolution ability of GCN for non-Euclidean data,we propose a dynamic prediction model of malware propagation based on representation learning and GCN to solve the time effectiveness problem of the malware propagation carrier.The experimental results show that the proposed model can effectively predict the behaviors of the nodes in the network and discover the influence of different characteristics of nodes on the malware propagation situation. 展开更多
关键词 MALWARE representation learning Graph convolutional networks(GCN) Tensor decomposition Propagation prediction
在线阅读 下载PDF
Improved Density Peaking Algorithm for Community Detection Based on Graph Representation Learning
11
作者 Jiaming Wang Xiaolan Xie +1 位作者 Xiaochun Cheng Yuhan Wang 《Computer Systems Science & Engineering》 SCIE EI 2022年第12期997-1008,共12页
There is a large amount of information in the network data that we canexploit. It is difficult for classical community detection algorithms to handle network data with sparse topology. Representation learning of netwo... There is a large amount of information in the network data that we canexploit. It is difficult for classical community detection algorithms to handle network data with sparse topology. Representation learning of network data is usually paired with clustering algorithms to solve the community detection problem.Meanwhile, there is always an unpredictable distribution of class clusters outputby graph representation learning. Therefore, we propose an improved densitypeak clustering algorithm (ILDPC) for the community detection problem, whichimproves the local density mechanism in the original algorithm and can betteraccommodate class clusters of different shapes. And we study the communitydetection in network data. The algorithm is paired with the benchmark modelGraph sample and aggregate (GraphSAGE) to show the adaptability of ILDPCfor community detection. The plotted decision diagram shows that the ILDPCalgorithm is more discriminative in selecting density peak points compared tothe original algorithm. Finally, the performance of K-means and other clusteringalgorithms on this benchmark model is compared, and the algorithm is proved tobe more suitable for community detection in sparse networks with the benchmarkmodel on the evaluation criterion F1-score. The sensitivity of the parameters ofthe ILDPC algorithm to the low-dimensional vector set output by the benchmarkmodel GraphSAGE is also analyzed. 展开更多
关键词 representation learning data mining low-dimensional embedding community detection density peaking algorithm
在线阅读 下载PDF
GNN Representation Learning and Multi-Objective Variable Neighborhood Search Algorithm for Wind Farm Layout Optimization
12
作者 Yingchao Li JianbinWang HaibinWang 《Energy Engineering》 EI 2024年第4期1049-1065,共17页
With the increasing demand for electrical services,wind farm layout optimization has been one of the biggest challenges that we have to deal with.Despite the promising performance of the heuristic algorithm on the rou... With the increasing demand for electrical services,wind farm layout optimization has been one of the biggest challenges that we have to deal with.Despite the promising performance of the heuristic algorithm on the route network design problem,the expressive capability and search performance of the algorithm on multi-objective problems remain unexplored.In this paper,the wind farm layout optimization problem is defined.Then,a multi-objective algorithm based on Graph Neural Network(GNN)and Variable Neighborhood Search(VNS)algorithm is proposed.GNN provides the basis representations for the following search algorithm so that the expressiveness and search accuracy of the algorithm can be improved.The multi-objective VNS algorithm is put forward by combining it with the multi-objective optimization algorithm to solve the problem with multiple objectives.The proposed algorithm is applied to the 18-node simulation example to evaluate the feasibility and practicality of the developed optimization strategy.The experiment on the simulation example shows that the proposed algorithm yields a reduction of 6.1% in Point of Common Coupling(PCC)over the current state-of-the-art algorithm,which means that the proposed algorithm designs a layout that improves the quality of the power supply by 6.1%at the same cost.The ablation experiments show that the proposed algorithm improves the power quality by more than 8.6% and 7.8% compared to both the original VNS algorithm and the multi-objective VNS algorithm. 展开更多
关键词 GNN representation learning variable neighborhood search multi-objective optimization wind farm layout point of common coupling
在线阅读 下载PDF
Meta-Path-Based Deep Representation Learning for Personalized Point of Interest Recommendation
13
作者 LI Zhong WU Meimei 《Journal of Donghua University(English Edition)》 CAS 2021年第4期310-322,共13页
With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately rec... With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately recommend POIs to users because the user-POI matrix is extremely sparse.In addition,a user's check-in activities are affected by many influential factors.However,most of existing studies capture only few influential factors.It is hard for them to be extended to incorporate other heterogeneous information in a unified way.To address these problems,we propose a meta-path-based deep representation learning(MPDRL)model for personalized POI recommendation.In this model,we design eight types of meta-paths to fully utilize the rich heterogeneous information in LBSNs for the representations of users and POIs,and deeply mine the correlations between users and POIs.To further improve the recommendation performance,we design an attention-based long short-term memory(LSTM)network to learn the importance of different influential factors on a user's specific check-in activity.To verify the effectiveness of our proposed method,we conduct extensive experiments on a real-world dataset,Foursquare.Experimental results show that the MPDRL model improves at least 16.97%and 23.55%over all comparison methods in terms of the metric Precision@N(Pre@N)and Recall@N(Rec@N)respectively. 展开更多
关键词 meta-path location-based recommendation heterogeneous information network(HIN) deep representation learning
在线阅读 下载PDF
Heterogeneous graph construction and node representation learning method of Treatise on Febrile Diseases based on graph convolutional network
14
作者 YAN Junfeng WEN Zhihua ZOU Beiji 《Digital Chinese Medicine》 2022年第4期419-428,共10页
Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based o... Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based on graph convolutional network(GCN).Methods Clauses that contain symptoms,formulas,and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs,which were used to propose a node representation learning method based on GCN−the Traditional Chinese Medicine Graph Convolution Network(TCM-GCN).The symptom-formula,symptom-herb,and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes,and thus acquiring the nodes’sum-aggregations of symptoms,formulas,and herbs to lay a foundation for the downstream tasks of the prediction models.Results Comparisons among the node representations with multi-hot encoding,non-fusion encoding,and fusion encoding showed that the Precision@10,Recall@10,and F1-score@10 of the fusion encoding were 9.77%,6.65%,and 8.30%,respectively,higher than those of the non-fusion encoding in the prediction studies of the model.Conclusion Node representations by fusion encoding achieved comparatively ideal results,indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model. 展开更多
关键词 Graph convolutional network(GCN) Heterogeneous graph Treatise on Febrile Diseases(Shang Han Lun 《伤寒论》) Node representations on heterogeneous graph Node representation learning
在线阅读 下载PDF
Performance vs.Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems
15
作者 Sarah M.Kamel Mai A.Fadel +1 位作者 Lamiaa Elrefaei Shimaa I.Hassan 《Computer Modeling in Engineering & Sciences》 2025年第4期373-411,共39页
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate... Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions. 展开更多
关键词 Arabic-VQA deep learning-based VQA deep multimodal information fusion multimodal representation learning VQA of yes/no questions VQA model complexity VQA model performance performance-complexity trade-off
在线阅读 下载PDF
Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions
16
作者 A-Seong Moon Seungyeon Jeong +3 位作者 Donghee Kim Mohd Asyraf Zulkifley Bong-Soo Sohn Jaesung Lee 《Computers, Materials & Continua》 2025年第11期2851-2872,共22页
Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed ... Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference,such as background noise,overlapping speech,and visual obstructions.The proposed framework employs a structured fusion approach,combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms.Audio data are transformed into mel-spectrogram representations,and visual data are represented as raw frame sequences.Spatial and temporal features are extracted through convolutional and transformer-based encoders,allowing the framework to capture complementary and hierarchical information fromboth sources.Across-modal attentionmodule enables selective emphasis on relevant signals while suppressing modality-specific noise.Performance is validated on a modified version of the AFEW dataset,in which controlled noise is introduced to emulate realistic conditions.The framework achieves higher classification accuracy than comparative baselines,confirming increased robustness under conditions of cross-modal disruption.This result demonstrates the suitability of the proposed method for deployment in practical emotion-aware technologies operating outside controlled environments.The study also contributes a systematic approach to fusion design and supports further exploration in the direction of resilientmultimodal emotion analysis frameworks.The source code is publicly available at https://github.com/asmoon002/AVER(accessed on 18 August 2025). 展开更多
关键词 Multimodal learning emotion recognition cross-modal attention robust representation learning
在线阅读 下载PDF
Deep Audio-visual Learning:A Survey 被引量:5
17
作者 Hao Zhu Man-Di Luo +2 位作者 Rui Wang Ai-Hua Zheng Ran He 《International Journal of Automation and computing》 EI CSCD 2021年第3期351-376,共26页
Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these tw... Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems.In this paper,we provide a comprehensive survey of recent audio-visual learning development.We divide the current audio-visual learning tasks into four different subfields:audiovisual separation and localization,audio-visual correspondence learning,audio-visual generation,and audio-visual representation learning.State-of-the-art methods,as well as the remaining challenges of each subfield,are further discussed.Finally,we summarize the commonly used datasets and challenges. 展开更多
关键词 Deep audio-visual learning audio-visual separation and localization correspondence learning generative models representation learning
原文传递
Intelligent representation method of image flatness for cold rolled strip 被引量:2
18
作者 Yang-huan Xu Dong-cheng Wang +1 位作者 Hong-min Liu Bo-wei Duan 《Journal of Iron and Steel Research International》 SCIE EI CAS CSCD 2024年第5期1177-1195,共19页
Real flatness images are the bases for flatness detection based on machine vision of cold rolled strip.The characteristics of a real flatness image are analyzed,and a lightweight strip location detection(SLD)model wit... Real flatness images are the bases for flatness detection based on machine vision of cold rolled strip.The characteristics of a real flatness image are analyzed,and a lightweight strip location detection(SLD)model with deep semantic segmentation networks is established.The interference areas in the real flatness image can be eliminated by the SLD model,and valid information can be retained.On this basis,the concept of image flatness is proposed for the first time.An image flatness representation(IFAR)model is established on the basis of an autoencoder with a new structure.The optimal structure of the bottleneck layer is 16×16×4,and the IFAR model exhibits a good representation effect.Moreover,interpretability analysis of the representation factors is carried out,and the difference and physical meaning of the representation factors for image flatness with different categories are analyzed.Image flatness with new defect morphologies(bilateral quarter waves and large middle waves)that are not present in the original dataset are generated by modifying the representation factors of the no wave image.Lastly,the SLD and IFAR models are used to detect and represent all the real flatness images on the test set.The average processing time for a single image is 11.42 ms,which is suitable for industrial applications.The research results provide effective methods and ideas for intelligent flatness detection technology based on machine vision. 展开更多
关键词 Cold rolled strip Image flatness Location detection representation learning Bottleneck layer
原文传递
Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization 被引量:2
19
作者 Hanjiang Hu Hesheng Wang +1 位作者 Zhe Liu Weidong Chen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第2期313-328,共16页
Visual localization is a crucial component in the application of mobile robot and autonomous driving.Image retrieval is an efficient and effective technique in image-based localization methods.Due to the drastic varia... Visual localization is a crucial component in the application of mobile robot and autonomous driving.Image retrieval is an efficient and effective technique in image-based localization methods.Due to the drastic variability of environmental conditions,e.g.,illumination changes,retrievalbased visual localization is severely affected and becomes a challenging problem.In this work,a general architecture is first formulated probabilistically to extract domain-invariant features through multi-domain image translation.Then,a novel gradientweighted similarity activation mapping loss(Grad-SAM)is incorporated for finer localization with high accuracy.We also propose a new adaptive triplet loss to boost the contrastive learning of the embedding in a self-supervised manner.The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models with and without Grad-SAM loss.Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMU-Seasons dataset.The strong generalization ability of our approach is verified with the RobotCar dataset using models pre-trained on urban parts of the CMU-Seasons dataset.Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision,especially under challenging environments with illumination variance,vegetation,and night-time images.Moreover,real-site experiments have been conducted to validate the efficiency and effectiveness of the coarse-to-fine strategy for localization. 展开更多
关键词 Deep representation learning place recognition visual localization
在线阅读 下载PDF
Data-driven flatness intelligent representation method of cold rolled strip 被引量:1
20
作者 Yang-huan Xu Dong-cheng Wang +1 位作者 Bo-wei Duan Hong-min Liu 《Journal of Iron and Steel Research International》 SCIE EI CAS CSCD 2023年第5期994-1012,共19页
A high-accuracy flatness prediction model is the basis for realizing flatness control.Real flatness is typically reflected as the strain distribution,which is a vector.However,it is difficult to obtain ideal results i... A high-accuracy flatness prediction model is the basis for realizing flatness control.Real flatness is typically reflected as the strain distribution,which is a vector.However,it is difficult to obtain ideal results if the real flatness is directly used as the output value of the flatness intelligent prediction model.Thus,it is necessary to seek an abstract representation method of real flatness.For this reason,two new intelligent flatness representation models were proposed based on the autoencoder of unsupervised learning theory:the flatness autoencoder representation(FAR)model and the flatness stacked sparse autoencoder representation(FSSAR)model.Compared with the traditional Legendre fourth-order polynomial representation model,the representation accuracies of the FAR and FSSAR models are significantly improved,better representing the flatness defects,like the double tight edge.The optimal number of bottleneck layer neurons in the FAR and FSSAR models is 5,which means that five basic patterns can accurately represent real flatness.Compared with the FAR model,the FSSAR model has higher representation accuracy,although the flatness basic pattern is more abstract,and the physical meaning is not clear enough.Furthermore,the accuracy of the FAR model is slightly lower than that of the FSSAR model.However,it can automatically learn the flatness basic pattern with a very clear physical meaning for both the theoretical and real flatness,which is an optimal intelligent representation method for flatness. 展开更多
关键词 Cold rolling flatness Data-driven model Unsupervised learning representation learning Autoencoder Bottleneck layer
原文传递
上一页 1 2 6 下一页 到第
使用帮助 返回顶部