可见光热红外(RGB and Thermal infrared,RGBT)跟踪是一种结合了可见光和热红外光两种不同传感器信息的多模态目标跟踪方法 .这种方法旨在克服单一传感器在特定环境下的局限性,通过融合多种传感器的数据来提高目标跟踪的鲁棒性和准确性...可见光热红外(RGB and Thermal infrared,RGBT)跟踪是一种结合了可见光和热红外光两种不同传感器信息的多模态目标跟踪方法 .这种方法旨在克服单一传感器在特定环境下的局限性,通过融合多种传感器的数据来提高目标跟踪的鲁棒性和准确性.然而,在现有的RGBT跟踪算法中,大多将可见光与热红外图像提取的特征直接进行融合,忽略了两种模态间的同质性与异质性.此外,RGBT跟踪还经常受到目标快速运动、尺度变化、光照变化、热交叉和遮挡等多种挑战因素的影响,现有工作往往是通过研究单一结构来同时解决所有问题,但这需要足够复杂的模型和足够多的训练数据.本文提出了一种新的面向不同挑战并结合多模态同异质信息分离与融合的网络,用于RGBT跟踪.在该网络的每层主干中都设计了一个挑战感知模块用于融合每种挑战下来自可见光与热红外两种不同模态的特征,并自适应地聚合所有挑战下的融合特征.此外,还加入了注意力增强模块及多尺度辅助模块对主干网络所提取的特征进行增强.最后根据可见光与热红外的同质性与异质性,分别提取它们的特有特征与共有特征并进行自适应融合.在GTOT、RGBT234和LasHeR数据集上的大量实验表明,与现有RGBT跟踪方法相比,本文提出的跟踪器显示出非常强的竞争力.展开更多
红外与可见光(RGB and Thermal,RGBT)目标跟踪得益于可见光与热红外2种模态数据的互补优势能够很好地提升跟踪器在部分极端环境下的目标定位能力。现有工作主要集中于如何对2种模态的特征进行提取和融合,忽略了不同模态中分层深度特征...红外与可见光(RGB and Thermal,RGBT)目标跟踪得益于可见光与热红外2种模态数据的互补优势能够很好地提升跟踪器在部分极端环境下的目标定位能力。现有工作主要集中于如何对2种模态的特征进行提取和融合,忽略了不同模态中分层深度特征的潜在价值,这些分层深度特征对目标的定位与分类有着重要的作用。为此,提出了一种多层次特征交互的多模态自适应融合目标跟踪算法(Multi-layer Feature Interaction and Modal-adaptation Fusion Network,MIMFNet),通过特征提取器和注意力机制对分层特征进行提取与自适应校准;分层特征聚合子网将不同层的特征进行自上而下相互聚合,使低层特征不仅保留了自身的空间细节也获取了高层特征的语义信息。设计了一种多模态信息传递模块对2种模态的分层信息进行自适应融合,使模型聚焦到质量更高的特征通道上。通过多个公开数据集上的大量实验结果表明,提出的多模态目标跟踪算法具有优良的抗干扰特性,特别是由于尺度变化(Scale Variation,SV)、热交叉(Thermal Crossover,TC)和遮挡(Occlusion,OCC)等因素引起的跟踪漂移得到了显著优化。展开更多
Despite the recent accomplishments in joint infrared-visible imaging,the bimodal defocus blur(BDB)phenomenon received scant attention.Our analysis reveals that BDB is predominantly attributable to disparities in optic...Despite the recent accomplishments in joint infrared-visible imaging,the bimodal defocus blur(BDB)phenomenon received scant attention.Our analysis reveals that BDB is predominantly attributable to disparities in optical parameters between cameras,resulting in two primary challenges:incomplete single-modal information and difficulty in cross-modal information interaction.With regard to the former,the infrared modality is the primary victim,as the deblurring networks’bias toward high-frequency results in erroneous low-frequency reconstruction(e.g.,over-sharpening).In the latter case,the relative nature of the blur effect can lead to ambiguity in determining which modality’s information should be prioritized for guidance,and conflicts may arise between the clear components of the blurred image and the blurry components of the clear image.To address these issues,we propose the first de-bimodal defocus blur(DBDB)method,which consists of a low-frequency semantic hold(LSH)module with a pre-trained infrared model and a cross-modal complementary feature induction(CCFI)module driven by a max-min blur entropy loss.LSH is designed to ensure that the low-frequency information captured by the infrared modality does not contain any misleading data,while CCFI facilitates the acquisition of accurate information by means of adaptive adjustment and the loss function.The experimental results of deblurring and downstream tasks on two synthetic datasets demonstrate the superiority of our method.展开更多
文摘可见光热红外(RGB and Thermal infrared,RGBT)跟踪是一种结合了可见光和热红外光两种不同传感器信息的多模态目标跟踪方法 .这种方法旨在克服单一传感器在特定环境下的局限性,通过融合多种传感器的数据来提高目标跟踪的鲁棒性和准确性.然而,在现有的RGBT跟踪算法中,大多将可见光与热红外图像提取的特征直接进行融合,忽略了两种模态间的同质性与异质性.此外,RGBT跟踪还经常受到目标快速运动、尺度变化、光照变化、热交叉和遮挡等多种挑战因素的影响,现有工作往往是通过研究单一结构来同时解决所有问题,但这需要足够复杂的模型和足够多的训练数据.本文提出了一种新的面向不同挑战并结合多模态同异质信息分离与融合的网络,用于RGBT跟踪.在该网络的每层主干中都设计了一个挑战感知模块用于融合每种挑战下来自可见光与热红外两种不同模态的特征,并自适应地聚合所有挑战下的融合特征.此外,还加入了注意力增强模块及多尺度辅助模块对主干网络所提取的特征进行增强.最后根据可见光与热红外的同质性与异质性,分别提取它们的特有特征与共有特征并进行自适应融合.在GTOT、RGBT234和LasHeR数据集上的大量实验表明,与现有RGBT跟踪方法相比,本文提出的跟踪器显示出非常强的竞争力.
基金supported by the National Natural Science Foundation of China(Nos.62475016 and 62201058).
文摘Despite the recent accomplishments in joint infrared-visible imaging,the bimodal defocus blur(BDB)phenomenon received scant attention.Our analysis reveals that BDB is predominantly attributable to disparities in optical parameters between cameras,resulting in two primary challenges:incomplete single-modal information and difficulty in cross-modal information interaction.With regard to the former,the infrared modality is the primary victim,as the deblurring networks’bias toward high-frequency results in erroneous low-frequency reconstruction(e.g.,over-sharpening).In the latter case,the relative nature of the blur effect can lead to ambiguity in determining which modality’s information should be prioritized for guidance,and conflicts may arise between the clear components of the blurred image and the blurry components of the clear image.To address these issues,we propose the first de-bimodal defocus blur(DBDB)method,which consists of a low-frequency semantic hold(LSH)module with a pre-trained infrared model and a cross-modal complementary feature induction(CCFI)module driven by a max-min blur entropy loss.LSH is designed to ensure that the low-frequency information captured by the infrared modality does not contain any misleading data,while CCFI facilitates the acquisition of accurate information by means of adaptive adjustment and the loss function.The experimental results of deblurring and downstream tasks on two synthetic datasets demonstrate the superiority of our method.