期刊文献+

《Machine Intelligence Research》 CSCD

作品数246被引量344H指数10
International Journal of Automation and computing is a publication of Institute of Automation, the C...查看详情>>
  • 曾用名 国际自动化与计算杂志(英文版)
  • 主办单位中国科学院自动化研究所
  • 国际标准连续出版物号2731-538X
  • 国内统一连续出版物号10-1799/TP
  • 出版周期双月刊
共找到246篇文章
< 1 2 13 >
每页显示 20 50 100
Degree-aware Progressive Contrastive Learning for Graph Combinatorial Optimization Problems
1
作者 Shiyun Zhao Yang Wu Yifan Zhang 《Machine Intelligence Research》 2025年第6期1153-1166,共14页
Addressing graph combinatorial optimization problems often poses significant challenges due to the difficulty and high cost of obtaining supervised labels.As a result,unsupervised algorithms have garnered increasing a... Addressing graph combinatorial optimization problems often poses significant challenges due to the difficulty and high cost of obtaining supervised labels.As a result,unsupervised algorithms have garnered increasing attention from researchers.In this paper,we propose a novel unsupervised framework that leverages contrastive learning to address these challenges.Drawing inspiration from traditional exact algorithms,we introduce a vertex-based degree-aware data augmentation method that enables the progressive learning of graph structure features.Furthermore,we incorporate optimal transport theory by using distance measures as the contrastive loss,thereby enhancing the model′s ability to capture local graph structures.Extensive experiments demonstrate the superior performance of our approach in terms of both solution accuracy and inference speed on most graph combinatorial optimization problems,particularly in large-scale graph problems and scenarios where training samples are scarce. 展开更多
关键词 Graph combinatorial optimization contrastive learning graph neural network unsupervised learning machine learning
原文传递
OTB-morph:One-time Biometrics via Morphing 被引量:1
2
作者 Mahdi Ghafourian Julian Fierrez +2 位作者 Ruben Vera-Rodriguez Aythami Morales Ignacio Serna 《Machine Intelligence Research》 EI CSCD 2023年第6期855-871,共17页
Cancelable biometrics are a group of techniques to transform the input biometric to an irreversible feature intentionally using a transformation function and usually a key in order to provide security and privacy in b... Cancelable biometrics are a group of techniques to transform the input biometric to an irreversible feature intentionally using a transformation function and usually a key in order to provide security and privacy in biometric recognition systems.This transformation is repeatable enabling subsequent biometric comparisons.This paper introduces a new idea to be exploited as a transformation function for cancelable biometrics aimed at protecting templates against iterative optimization attacks.Our proposed scheme is based on time-varying keys(random biometrics in our case)and morphing transformations.An experimental implementation of the proposed scheme is given for face biometrics.The results confirm that the proposed approach is able to withstand leakage attacks while improving the recognition performance. 展开更多
关键词 BIOMETRICS face recognition template protection MORPHING SECURITY
原文传递
Federated Learning on Multimodal Data:A Comprehensive Survey 被引量:3
3
作者 Yi-Ming Lin Yuan Gao +3 位作者 Mao-Guo Gong Si-Jia Zhang Yuan-Qiao Zhang Zhi-Yuan Li 《Machine Intelligence Research》 EI CSCD 2023年第4期539-553,共15页
With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in ... With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in a collaborative but private way without exchanging data.However,most FL clients are currently unimodal.With the rise of edge computing,various types of sensors and wearable devices generate a large amount of data from different modalities,which has inspired research efforts in multimodal federated learning(MMFL).In this survey,we explore the area of MMFL to address the fundamental challenges of FL on multimodal data.First,we analyse the key motivations for MMFL.Second,the currently proposed MMFL methods are technically classified according to the modality distributions and modality annotations in MMFL.Then,we discuss the datasets and application scenarios of MMFL.Finally,we highlight the limitations and challenges of MMFL and provide insights and methods for future research. 展开更多
关键词 Federated learning multimodal learning heterogeneous data edge computing collaborative learning
原文传递
Paradigm Shift in Natural Language Processing 被引量:11
4
作者 Tian-Xiang Sun Xiang-Yang Liu +1 位作者 Xi-Peng Qiu Xuan-Jing Huang 《Machine Intelligence Research》 EI CSCD 2022年第3期169-183,共15页
In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of... In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of tasks such as POS-tagging, named entity recognition (NER), and chunking, and adopt the classification paradigm to solve tasks like sentiment analysis. With the rapid progress of pre-trained language models, recent years have witnessed a rising trend of paradigm shift, which is solving one NLP task in a new paradigm by reformulating the task. The paradigm shift has achieved great success on many tasks and is becoming a promising way to improve model performance. Moreover, some of these paradigms have shown great potential to unify a large number of NLP tasks, making it possible to build a single model to handle diverse tasks. In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks. 展开更多
关键词 Natural language processing pre-trained language models deep learning sequence-to-sequence paradigm shift
原文传递
Motion-guided Visual Tracking
5
作者 Pengyu Zhang Simiao Lai +1 位作者 Dong Wang Huchuan Lu 《Machine Intelligence Research》 2025年第5期983-998,共16页
Motion information is a crucial cue to build a robust tracker,especially in handling object occlusion and fast drift caused by cameras and objects.However,it has not been fully exploited.In this study,we attempt to ex... Motion information is a crucial cue to build a robust tracker,especially in handling object occlusion and fast drift caused by cameras and objects.However,it has not been fully exploited.In this study,we attempt to exploit motion cues to guide visual trackers without bells and whistles.First,we decouple motion into two types:camera motion and object motion.Then,we predict them individually via the proposed camera motion modeling and object trajectory prediction.Each module contains a motion detector and a verifier.As for camera motion,we apply the off-the-shelf keypoint matching method to detect camera movement and propose a novel self-supervised camera motion verifier to validate its confidence.Given the previous object trajectory,object trajectory prediction aims to predict the future location of the target and select a reliable trajectory to handle fast object motion and occlusion.Numerous experiments on several mainstream tracking datasets,including OTB100,DTB70,TC128,UAV123,VOT2018 and GOT10k,demonstrate the effectiveness and generalization ability of our module,with real-time speed. 展开更多
关键词 Visual tracking motion embedding camera motion modeling self-supervised learning trajectory prediction
原文传递
Correction to: Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications 被引量:1
6
作者 Wei Ji Jingjing Li +3 位作者 Qi Bi Tingwei Liu Wenbo Li Li Cheng 《Machine Intelligence Research》 EI CSCD 2024年第6期1215-1215,共1页
Correction to: Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-worldApplicationsDOI: 10.100 7/s11633-023-1385-0Authors: Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li ChengThe ar... Correction to: Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-worldApplicationsDOI: 10.100 7/s11633-023-1385-0Authors: Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li ChengThe article Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-world Applications,written by Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li Cheng, was originally published withoutOpen Access. After publication, the authors decided toopt for Open Choice and to make the article an Open Accesspublication. Therefore, the copyright of the article has been changed to The Author(s) 2024 and thearticle is forthwith distributed under the terms of theCreative Commons Attribution 4.0 International License(http:/ /creativecommons.org/licenses/by/4.0/), which permitsuse, duplication, adaptation, distribution and reproductionin any medium or format, as long as you give appropriatecredit to the original author(s) and the source,provide a link to the Creative Commons license, and indicateif changes were made. 展开更多
关键词 PERFECT Open LICENSE
原文传递
Multimodal Fusion of Brain Imaging Data: Methods and Applications 被引量:1
7
作者 Na Luo Weiyang Shi +2 位作者 Zhengyi Yang Ming Song Tianzi Jiang 《Machine Intelligence Research》 EI CSCD 2024年第1期136-152,共17页
Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing... Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing and analyzing the brain.To lever-age the complementary representations of different modalities,multimodal fusion is consequently needed to dig out both inter-modality and intra-modality information.With the exploited rich information,it is becoming popular to combine multiple modality data to ex-plore the structural and functional characteristics of the brain in both health and disease status.In this paper,we first review a wide spectrum of advanced machine learning methodologies for fusing multimodal brain imaging data,broadly categorized into unsupervised and supervised learning strategies.Followed by this,some representative applications are discussed,including how they help to under-stand the brain arealization,how they improve the prediction of behavioral phenotypes and brain aging,and how they accelerate the biomarker exploration of brain diseases.Finally,we discuss some exciting emerging trends and important future directions.Collectively,we intend to offer a comprehensive overview of brain imaging fusion methods and their successful applications,along with the chal-lenges imposed by multi-scale and big data,which arises an urgent demand on developing new models and platforms. 展开更多
关键词 Multimodal fusion supervised learning unsupervised learning brain atlas COGNITION brain disorders
原文传递
Correction to: The Life Cycle of Knowledge in Big Language Models: A Survey
8
作者 Boxi Cao Hongyu Lin +1 位作者 Xianpei Han Le Sun 《Machine Intelligence Research》 2025年第6期1167-1167,共1页
Correction to:The Life Cycle of Knowledge in Big Language Models:A Survey DOI:10.1007/s11633-023-1416-x Authors:Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun The article The Life Cycle of Knowledge in Big Language Models:A S... Correction to:The Life Cycle of Knowledge in Big Language Models:A Survey DOI:10.1007/s11633-023-1416-x Authors:Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun The article The Life Cycle of Knowledge in Big Language Models:A Survey,written by Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun,was originally published without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication. 展开更多
关键词 life cycle knowledge KNOWLEDGE SURVEY life cycle open access publication open choice big language models open accessafter
原文传递
Interpretability of Neural Networks Based on Game-theoretic Interactions 被引量:1
9
作者 Huilin Zhou Jie Ren +3 位作者 Huiqi Deng Xu Cheng Jinpeng Zhang Quanshi Zhang 《Machine Intelligence Research》 EI CSCD 2024年第4期718-739,共22页
This paper introduces the system of game-theoretic interactions,which connects both the explanation of knowledge encoded in a deep neural networks(DNN)and the explanation of the representation power of a DNN.In this s... This paper introduces the system of game-theoretic interactions,which connects both the explanation of knowledge encoded in a deep neural networks(DNN)and the explanation of the representation power of a DNN.In this system,we define two gametheoretic interaction indexes,namely the multi-order interaction and the multivariate interaction.More crucially,we use these interaction indexes to explain feature representations encoded in a DNN from the following four aspects:(1)Quantifying knowledge concepts encoded by a DNN;(2)Exploring how a DNN encodes visual concepts,and extracting prototypical concepts encoded in the DNN;(3)Learning optimal baseline values for the Shapley value,and providing a unified perspective to compare fourteen different attribution methods;(4)Theoretically explaining the representation bottleneck of DNNs.Furthermore,we prove the relationship between the interaction encoded in a DNN and the representation power of a DNN(e.g.,generalization power,adversarial transferability,and adversarial robustness).In this way,game-theoretic interactions successfully bridge the gap between“the explanation of knowledge concepts encoded in a DNN”and"the explanation of the representation capacity of a DNN"as a unified explanation. 展开更多
关键词 Model interpretability and transparency explainable AI game theory INTERACTION deep learning.
原文传递
GraphFlow+:Exploiting Conversation Flow in Conversational Machine Comprehension with Graph Neural Networks
10
作者 Jing Hu Lingfei Wu +2 位作者 Yu Chen Po Hu Mohammed J.Zaki 《Machine Intelligence Research》 EI CSCD 2024年第2期272-282,共11页
The conversation machine comprehension(MC)task aims to answer questions in the multi-turn conversation for a single passage.However,recent approaches don’t exploit information from historical conversations effectivel... The conversation machine comprehension(MC)task aims to answer questions in the multi-turn conversation for a single passage.However,recent approaches don’t exploit information from historical conversations effectively,which results in some references and ellipsis in the current question cannot be recognized.In addition,these methods do not consider the rich semantic relationships between words when reasoning about the passage text.In this paper,we propose a novel model GraphFlow+,which constructs a context graph for each conversation turn and uses a unique recurrent graph neural network(GNN)to model the temporal dependencies between the context graphs of each turn.Specifically,we exploit three different ways to construct text graphs,including the dynamic graph,static graph,and hybrid graph that combines the two.Our experiments on CoQA,QuAC and DoQA show that the GraphFlow+model can outperform the state-of-the-art approaches. 展开更多
关键词 Conversational machine comprehension(MC) reading comprehension question answering graph neural networks(GNNs) natural language processing(NLP)
原文传递
Prioritization Hindsight Experience Based on Spatial Position Attention for Robots
11
作者 Ye Yuan Yu Sha +3 位作者 Feixiang Sun Haofan Lu Shuiping Gou Jie Luo 《Machine Intelligence Research》 2025年第1期160-175,共16页
Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“sm... Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“small goals”within a hierarchical decision model.However,HER does not consider the value of different episodes for agent learning.In this paper,we propose SPAHER,a framework for prioritizing hindsight experiences based on spatial position attention.SPAHER allows the agent to prioritize more valuable experiences in a manipulation task.It achieves this by calculating transition and trajectory spatial position functions to determine the value of each episode for experience replays.We evaluate SPAHER on eight robot manipulation tasks in the Fetch and Hand environments provided by OpenAI Gym.Simulation results show that our method improves the final mean success rate by an average of 3.63%compared to HER,especially in challenging Hand environments.Notably,these improvements are achieved without any increase in computation time. 展开更多
关键词 Hindsight experience replay spatial position attention sparse reward deep reinforcement learning prioritization hindsight experience
原文传递
A Comprehensive Survey on Trustworthy Graph Neural Networks:Privacy,Robustness,Fairness,and Explainability 被引量:2
12
作者 Enyan Dai Tianxiang Zhao +5 位作者 Huaisheng Zhu Junjie Xu Zhimeng Guo Hui Liu Jiliang Tang Suhang Wang 《Machine Intelligence Research》 EI CSCD 2024年第6期1011-1061,共51页
Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios suc... Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios such as financial analysis,traffic predictions,and drug discovery.Despite their great potential in benefiting humans in the real world,recent study shows that GNNs can leak private information,are vulnerable to adversarial attacks,can inherit and magnify societal bias from training data and lack inter-pretability,which have risk of causing unintentional harm to the users and society.For example,existing works demonstrate that at-tackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.GNNs trained on social networks may embed the discrimination in their decision process,strengthening the undesirable societal bias.Consequently,trust-worthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users'trust in GNNs.In this pa-per,we give a comprehensive survey of GNNs in the computational aspects of privacy,robustness,fairness,and explainability.For each aspect,we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs.We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthi-ness. 展开更多
关键词 Graph neural networks(GNNs) TRUSTWORTHY PRIVACY ROBUSTNESS FAIRNESS explainability
原文传递
Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network
13
作者 Si-Qi Li Yue Gao Qiong-Hai Dai 《Machine Intelligence Research》 EI CSCD 2022年第4期307-318,共12页
Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions du... Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames.Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution.However,synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream,and the initial brightness is unknown.In this paper,we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion,which uses event streams to provide complete scene information and frames to provide color and texture information.An event stream encoder based on the spiking neural network(SNN)is proposed to encode and denoise the event stream efficiently.A comparison loss is proposed to generate clearer results.Experimental results on a largescale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance. 展开更多
关键词 Event camera multi-modal fusion image de-occlusion spiking neural network(SNN) image reconstruction
原文传递
Interpretable Visual Neural Decoding with Unsupervised Semantic Disentanglement
14
作者 Qiongyi Zhou Changde Du +3 位作者 Dan Li Bincheng Wen Le Chang Huiguang He 《Machine Intelligence Research》 2025年第3期553-570,共18页
In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decod... In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decoding performance.However,these models that map neural activities onto semantically entangled feature space are difficult to interpret.It is hard to understand the connections between neural activities and these abstract features.In this paper,we propose an interpretable neural decoding model that projects neural activities onto a semantically disentangled feature space with each dimension representing distinct visual attributes,such as gender and facial pose.A two-stage algorithm is designed to achieve this goal.First,a deep generative model learns semantically-disentangled image representations in an unsupervised way.Second,neural activities are linearly embedded into the semantic space,which the generator uses to reconstruct visual stimuli.Due to modality heterogeneity,it is challenging to learn such a neural embedded high-level semantic representation.We induce pixel,feature,and semantic alignment to ensure reconstruction quality.Three experimental fMRI datasets containing handwritten digits,characters,and human face stimuli are used to evaluate the neural decoding performance of our model.We also demonstrate the model interpretability through a reconstructed image editing application.The experimental results indicate that our model maintains a competitive decoding performance while remaining interpretable. 展开更多
关键词 Visual neural decoding disentangled representation learning model interpretability cross-modal generation generative adversarial networks
原文传递
Accelerated Elliptical PDE Solver for Computational Fluid Dynamics Based on Configurable U-Net Architecture: Analogy to V-Cycle Multigrid
15
作者 Kiran Bhaganagar David Chambers 《Machine Intelligence Research》 2025年第2期324-336,共13页
A configurable U-Net architecture is trained to solve the multi-scale elliptical partial differential equations.The motivation is to improve the computational cost of the numerical solution of Navier-Stokes equations... A configurable U-Net architecture is trained to solve the multi-scale elliptical partial differential equations.The motivation is to improve the computational cost of the numerical solution of Navier-Stokes equations–the governing equations for fluid dynamics.Building on the underlying concept of V-Cycle multigrid methods,a neural network framework using U-Net architecture is optimized to solve the Poisson equation and Helmholtz equations–the characteristic form of the discretized Navier-Stokes equations.The results demonstrate the optimized U-Net captures the high dimensional mathematical features of the elliptical operator and with a better convergence than the multigrid method.The optimal performance between the errors and the FLOPS is the(3,2,5)case with 3 stacks of UNets,with 2 initial features,5 depth layers and with ELU activation.Further,by training the network with the multi-scale synthetic data the finer features of the physical system are captured. 展开更多
关键词 Configurable U-Net architecture neural network methods for elliptical equations multi-scale partial differential equations Poisson and Helmholtz equation solvers computational fluid dynamics solutions.
原文传递
Correction to:GraphFM:Graph Factorization Machines for Feature Interaction Modelling
16
作者 Shu Wu Zekun Li +3 位作者 Yunyue Su Zeyu Cui Xiaoyu Zhang Liang Wang 《Machine Intelligence Research》 2025年第6期1168-1168,共1页
Correction to:GraphFM:Graph Factorization Machines for Feature Interaction Modelling DOI:10.1007/s11633-024-1505-5 Authors:Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang The article GraphFM:Graph Factoriza... Correction to:GraphFM:Graph Factorization Machines for Feature Interaction Modelling DOI:10.1007/s11633-024-1505-5 Authors:Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang The article GraphFM:Graph Factorization Machines for Feature Interaction Modelling,written by Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang,was originally published without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication. 展开更多
关键词 factorization machines feature interaction modelling open access feature interaction modellingwritten open access publication open accessafter graphfm graph factorization machines graph factorization machines
原文传递
Auto-3D-house Design from Structured User Requirements
17
作者 Minkui Tan Qi Chen +3 位作者 Zixiong Huang Qi Wu Yuanqing Li Jiaqiu Zhou 《Machine Intelligence Research》 2025年第2期368-385,共18页
We study the task of automated house design,which aims to automatically generate 3D houses from user requirements.However,in the automatic system,it is non-trivial due to the intrinsic complexity of house designing:1)... We study the task of automated house design,which aims to automatically generate 3D houses from user requirements.However,in the automatic system,it is non-trivial due to the intrinsic complexity of house designing:1)the understanding of user requirements,where the users can hardly provide high-quality requirements without any professional knowledge;2)the design of house plan,which mainly focuses on how to capture the effective information from user requirements.To address the above issues,we propose an automatic house design framework,called auto-3D-house design(A3HD).Unlike the previous works that consider the user requirements in an unstructured way(e.g.,natural language),we carefully design a structured list that divides the requirements into three parts(i.e.,layout,outline,and style),which focus on the attributes of rooms,the outline of the building,and the style of decoration,respectively.Following the processing of architects,we construct a bubble diagram(i.e.,graph)that covers the rooms′attributes and relations under the constraint of outline.In addition,we take each outline as a combination of points and orders,ensuring that it can represent the outlines with arbitrary shapes.Then,we propose a graph feature generation module(GFGM)to capture layout features from the bubble diagrams and an outline feature generation module(OFGM)for outline features.Finally,we render 3D houses according to the given style requirements in a rule-based method.Experiments on two benchmark datasets(i.e.,RPLAN and T3HM)demonstrate the effectiveness of our A3HD in terms of both quantitative and qualitative evaluation metrics. 展开更多
关键词 Automated house design user requirements understanding outline processing layout generation graph feature generation.
原文传递
Biological Eagle-eye Inspired Target Detection for Unmanned Aerial Vehicles Equipped with a Manipulator
18
作者 Yi-Min Deng Si-Yuan Wang 《Machine Intelligence Research》 EI CSCD 2023年第5期741-752,共12页
Inspired by eagle eye mechanisms,the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle(UAV)with a mechanical arm.In t... Inspired by eagle eye mechanisms,the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle(UAV)with a mechanical arm.In this paper,a novel eagle-eye inspired multi-camera sensor and a saliency detection method are proposed.A combined camera system is built by simulating the double fovea structure on the eagle retina.A saliency target detection method based on the eagle midbrain inhibition mechanism is proposed by measuring the static saliency information and dynamic features.Thus,salient targets can be accurately detected through the collaborative work between different cameras of the proposed multi-camera sensor.Experimental results show that the eagle-eye inspired visual system is able to continuously detect targets in outdoor scenes and that the proposed algorithm has a strong inhibitory effect on moving background interference. 展开更多
关键词 Unmanned aerial vehicle(UAV) eagle eye multi-camera sensor target detection saliency detection
原文传递
Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval 被引量:3
19
作者 Haoyu Lu Yuqi Huo +2 位作者 Mingyu Ding Nanyi Fei Zhiwu Lu 《Machine Intelligence Research》 EI CSCD 2023年第4期569-582,共14页
Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often... Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient. 展开更多
关键词 Image-text retrieval multimodal modeling contrastive learning weak correlation computer vision
原文传递
Message from the EiC
20
作者 Tieniu Tan 《Machine Intelligence Research》 EI CSCD 2022年第1期1-2,共2页
Machine intelligence is a fast-developing field and often used as a synonym for artificial intelligence but with a stronger application orientation. Machine intelligence algorithms are not forms of alchemy, but rather... Machine intelligence is a fast-developing field and often used as a synonym for artificial intelligence but with a stronger application orientation. Machine intelligence algorithms are not forms of alchemy, but rather useful tools of improving the world we live in. With easier access to data and more affordable computational power,discoveries in research labs are being transformed into real-world applications and therefore creating enormous social and economic values. 展开更多
关键词 artificial MESSAGE INTELLIGENCE
原文传递
上一页 1 2 13 下一页 到第
使用帮助 返回顶部