For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe...For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.展开更多
目标检测是遥感影像解译当中最重要的任务之一。当前,基于深度学习的遥感目标检测模型大多依赖于预定义的锚框,且往往忽略了场景中的上下文信息,导致检测性能和泛化能力受限。基于此,本文提出了一种面向遥感影像目标检测的场景关联无锚...目标检测是遥感影像解译当中最重要的任务之一。当前,基于深度学习的遥感目标检测模型大多依赖于预定义的锚框,且往往忽略了场景中的上下文信息,导致检测性能和泛化能力受限。基于此,本文提出了一种面向遥感影像目标检测的场景关联无锚框YOLO网络(Scene Related Anchor-Free YOLO,SRAF-YOLO)。SRAF-YOLO首先引入了一种场景增强的多尺度特征提取模块,通过将场景特征与目标特征融合,生成富含场景上下文信息的场景增强特征,并进一步利用多尺度操作提取包含场景语义的多尺度特征,有效引入场景上下文信息。在此基础上,设计了一种场景辅助无锚框检测头,利用特征图中的场景信息对目标类别预测进行约束,以提升检测精度,同时无锚框结构有效减少了锚框相关参数的计算量。在RSOD和NWPU VHR-10数据集上的实验结果表明,SRAF-YOLO通过融合场景信息和无锚框机制提升了目标检测精度,平均精度均值(mAP)分别达到94.58%和95.95%,相较于基线模型YOLOv8分别提升了1.51%和3.0%,并优于其他对比方法。在外部数据集上的验证结果进一步证实,该算法具备良好的泛化能力。展开更多
针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,...针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,构建一个多目标视觉定位数据集Mmo3DRefer,并设计跨模态匹配网络TextVizNet。TextVizNet通过预训练的单目检测器生成目标的三维边界框,并借助信息融合模块与信息对齐模块实现视觉与语言信息的深度整合,进而实现文本指导下的多目标三维检测。与CORE-3DVG(Contextual Objects and RElations for 3D Visual Grounding)、3DVG-Transformer和Multi3DRefer(Multiple 3D object Referencing dataset and task)等5种方法对比的实验结果表明,与次优方法Multi3DRefer相比,TextVizNet在Mmo3DRefer数据集上的F1-score、精确度和召回率分别提升了8.92%、8.39%和9.57%,显著提升了复杂场景下基于文本的多目标定位精度,为自动驾驶和智能机器人等实际应用提供了有效支持。展开更多
Infrared scene simulation has extensive applications in military and civil fields. Based on a certain experimental environment,object-oriented graphics rendering engine( OGRE) is utilized to simulate a real three-di...Infrared scene simulation has extensive applications in military and civil fields. Based on a certain experimental environment,object-oriented graphics rendering engine( OGRE) is utilized to simulate a real three-dimensional infrared complex scene. First,the target radiation of each part is calculated based on our experimental data. Then through the analysis of the radiation characteristics of targets and related material,an infrared texture library is established and the 3ds Max software is applied to establish an infrared radiation model.Finally,a real complex infrared scene is created by using the OGRE engine image rendering technology and graphic processing unit( GPU) programmable pipeline technology. The results show that the simulation images are very similar to real images and are good supplements to real data.展开更多
针对自动泊车过程中复杂环境所导致的模型目标检测精确度低、漏检率高等问题,提出一种基于YOLOv5(you only look once version 5)改进的自动泊车场景多目标检测模型:PSMD(parking scene multi-target detection)-YOLOv5。采用混合池化...针对自动泊车过程中复杂环境所导致的模型目标检测精确度低、漏检率高等问题,提出一种基于YOLOv5(you only look once version 5)改进的自动泊车场景多目标检测模型:PSMD(parking scene multi-target detection)-YOLOv5。采用混合池化的方法对SPPF(spatial pyramid pooling-fast)进行改进,提升模型对全局信息的提取能力,从而增强模型检测的性能;引入BiFPN(bidirectional feature pyramid network)网络结构,调整不同尺度特征的重要性程度,加强模型的特征融合能力;加入结合了大卷积核的CA(coordinate attention)注意力机制,增强感兴趣的目标区域,进一步提升模型的特征提取能力;将CIoU(complete intersection over union)的损失函数替换为MPDIoU(minimum point distance based intersection over union)损失函数,加快模型的收敛以及提升定位能力。实验结果表明,PSMD-YOLOv5模型在自动泊车场景数据集上检测的查准率、召回率和mAP分别为81.8%、80.3%和85.3%,相比于原始YOLOv5模型整体的查准率、召回率和mAP分别提升了1%、6.5%和5.2%。并通过与其他目标检测模型的比较,验证了本文模型的有效性。展开更多
Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more...Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.展开更多
文摘For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.
文摘目标检测是遥感影像解译当中最重要的任务之一。当前,基于深度学习的遥感目标检测模型大多依赖于预定义的锚框,且往往忽略了场景中的上下文信息,导致检测性能和泛化能力受限。基于此,本文提出了一种面向遥感影像目标检测的场景关联无锚框YOLO网络(Scene Related Anchor-Free YOLO,SRAF-YOLO)。SRAF-YOLO首先引入了一种场景增强的多尺度特征提取模块,通过将场景特征与目标特征融合,生成富含场景上下文信息的场景增强特征,并进一步利用多尺度操作提取包含场景语义的多尺度特征,有效引入场景上下文信息。在此基础上,设计了一种场景辅助无锚框检测头,利用特征图中的场景信息对目标类别预测进行约束,以提升检测精度,同时无锚框结构有效减少了锚框相关参数的计算量。在RSOD和NWPU VHR-10数据集上的实验结果表明,SRAF-YOLO通过融合场景信息和无锚框机制提升了目标检测精度,平均精度均值(mAP)分别达到94.58%和95.95%,相较于基线模型YOLOv8分别提升了1.51%和3.0%,并优于其他对比方法。在外部数据集上的验证结果进一步证实,该算法具备良好的泛化能力。
文摘针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,构建一个多目标视觉定位数据集Mmo3DRefer,并设计跨模态匹配网络TextVizNet。TextVizNet通过预训练的单目检测器生成目标的三维边界框,并借助信息融合模块与信息对齐模块实现视觉与语言信息的深度整合,进而实现文本指导下的多目标三维检测。与CORE-3DVG(Contextual Objects and RElations for 3D Visual Grounding)、3DVG-Transformer和Multi3DRefer(Multiple 3D object Referencing dataset and task)等5种方法对比的实验结果表明,与次优方法Multi3DRefer相比,TextVizNet在Mmo3DRefer数据集上的F1-score、精确度和召回率分别提升了8.92%、8.39%和9.57%,显著提升了复杂场景下基于文本的多目标定位精度,为自动驾驶和智能机器人等实际应用提供了有效支持。
基金Supported by the National Twelfth Five-Year Project(40405050303)
文摘Infrared scene simulation has extensive applications in military and civil fields. Based on a certain experimental environment,object-oriented graphics rendering engine( OGRE) is utilized to simulate a real three-dimensional infrared complex scene. First,the target radiation of each part is calculated based on our experimental data. Then through the analysis of the radiation characteristics of targets and related material,an infrared texture library is established and the 3ds Max software is applied to establish an infrared radiation model.Finally,a real complex infrared scene is created by using the OGRE engine image rendering technology and graphic processing unit( GPU) programmable pipeline technology. The results show that the simulation images are very similar to real images and are good supplements to real data.
文摘针对自动泊车过程中复杂环境所导致的模型目标检测精确度低、漏检率高等问题,提出一种基于YOLOv5(you only look once version 5)改进的自动泊车场景多目标检测模型:PSMD(parking scene multi-target detection)-YOLOv5。采用混合池化的方法对SPPF(spatial pyramid pooling-fast)进行改进,提升模型对全局信息的提取能力,从而增强模型检测的性能;引入BiFPN(bidirectional feature pyramid network)网络结构,调整不同尺度特征的重要性程度,加强模型的特征融合能力;加入结合了大卷积核的CA(coordinate attention)注意力机制,增强感兴趣的目标区域,进一步提升模型的特征提取能力;将CIoU(complete intersection over union)的损失函数替换为MPDIoU(minimum point distance based intersection over union)损失函数,加快模型的收敛以及提升定位能力。实验结果表明,PSMD-YOLOv5模型在自动泊车场景数据集上检测的查准率、召回率和mAP分别为81.8%、80.3%和85.3%,相比于原始YOLOv5模型整体的查准率、召回率和mAP分别提升了1%、6.5%和5.2%。并通过与其他目标检测模型的比较,验证了本文模型的有效性。
基金supported by National High Technology Research and Development Program of China (863 Program) (No.2015AA016306)National Nature Science Foundation of China (No.61231015)National Nature Science Foundation of China (No.61671335)
文摘Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.