Fusion methods based on multi-scale transforms have become the mainstream of the pixel-level image fusion. However,most of these methods cannot fully exploit spatial domain information of source images, which lead to ...Fusion methods based on multi-scale transforms have become the mainstream of the pixel-level image fusion. However,most of these methods cannot fully exploit spatial domain information of source images, which lead to the degradation of image.This paper presents a fusion framework based on block-matching and 3D(BM3D) multi-scale transform. The algorithm first divides the image into different blocks and groups these 2D image blocks into 3D arrays by their similarity. Then it uses a 3D transform which consists of a 2D multi-scale and a 1D transform to transfer the arrays into transform coefficients, and then the obtained low-and high-coefficients are fused by different fusion rules. The final fused image is obtained from a series of fused 3D image block groups after the inverse transform by using an aggregation process. In the experimental part, we comparatively analyze some existing algorithms and the using of different transforms, e.g. non-subsampled Contourlet transform(NSCT), non-subsampled Shearlet transform(NSST), in the 3D transform step. Experimental results show that the proposed fusion framework can not only improve subjective visual effect, but also obtain better objective evaluation criteria than state-of-the-art methods.展开更多
当前,步态识别的主流方法常依赖堆叠卷积层来逐步扩大感受野,以融合局部特征,这种方法大多采用浅层网络,在提取步态图像的全局特征时存在一定的局限性,并缺乏对时序周期特征信息的关注。因此提出一种融合Transformer和3D卷积的深层神经...当前,步态识别的主流方法常依赖堆叠卷积层来逐步扩大感受野,以融合局部特征,这种方法大多采用浅层网络,在提取步态图像的全局特征时存在一定的局限性,并缺乏对时序周期特征信息的关注。因此提出一种融合Transformer和3D卷积的深层神经网络算法(3D convolutional gait recognition network based on adaptFormer and spect-conv,3D-ASgaitNet)。首先,初始残差卷积层将二进制轮廓数据转换为浮点编码特征图,以提供密集的低级结构特征;在此基础上,光谱层通过频域和时域的联合处理增强特征提取能力,并使用伪3D残差卷积模块进一步提取高级时空特征;最后融合AdaptFormer模块,通过轻量级的下采样-上采样网络结构,以适应不同的数据分布和任务需求,提供灵活的特征变换能力。3D-ASgaitNet分别在4个公开的室内数据集(CASIA-B、OU-MVLP)、室外数据集(GREW、Gait3D)上进行,分别取得99.84%、87.83%、45.32%、72.12%的识别准确率。实验结果表明,所提出方法在CASIA-B、Gait3D数据集中的识别准确率接近SOTA性能。展开更多
【背景】传统方法因静态感受野设计较难适配城市自动驾驶场景中汽车、行人及骑行者等目标的显著尺度差异,且跨尺度特征融合易引发层级干扰。【方法】针对自动驾驶场景中多类别、多尺寸目标的3D检测中跨尺度表征一致性的关键挑战,本研究...【背景】传统方法因静态感受野设计较难适配城市自动驾驶场景中汽车、行人及骑行者等目标的显著尺度差异,且跨尺度特征融合易引发层级干扰。【方法】针对自动驾驶场景中多类别、多尺寸目标的3D检测中跨尺度表征一致性的关键挑战,本研究提出基于均衡化感受野的3D目标检测方法VoxTNT,通过局部-全局协同注意力机制提升检测性能。在局部层面,设计了PointSetFormer模块,引入诱导集注意力模块(Induced Set Attention Block,ISAB),通过约简的交叉注意力聚合高密度点云的细粒度几何特征,突破传统体素均值池化的信息损失瓶颈;在全局层面,设计了VoxelFormerFFN模块,将非空体素抽象为超点集并实施跨体素ISAB交互,建立长程上下文依赖关系,并将全局特征学习计算负载从O(N^(2))压缩至O(M^(2))(M<<N,M为非空体素数量),规避了复杂的Transformer直接使用在原始点云造成的高计算复杂度。该双域耦合架构实现了局部细粒度感知与全局语义关联的动态平衡,有效缓解固定感受野和多尺度融合导致的特征建模偏差。【结果】实验表明,该方法在KITTI数据集单阶段检测下,中等难度级别的行人检测精度AP(Average Precision)值达到59.56%,较SECOND基线提高约12.4%,两阶段检测下以66.54%的综合指标mAP(mean Average Precision)领先次优方法BSAODet的66.10%。同时,在WOD数据集中验证了方法的有效性,综合指标mAP达到66.09%分别超越SECOND和PointPillars基线7.7%和8.5%。消融实验进一步表明,均衡化局部和全局感受野的3D特征学习机制能显著提升小目标检测精度(如在KITTI数据集中全组件消融的情况下,中等难度级别的行人和骑行者检测精度分别下降10.8%和10.0%),同时保持大目标检测的稳定性。【结论】本研究为解决自动驾驶多尺度目标检测难题提供了新思路,未来将优化模型结构以进一步提升效能。展开更多
Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public ...Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public channels will easily draw the attention of eavesdroppers.Here,we introduce an efficient encryption method for SPI data transmission that uses the 3D Arnold transformation to directly disrupt 1D single-pixel light signals and utilizes the elliptic curve encryption algorithm for key transmission.This encryption scheme immediately employs Hadamard patterns to illuminate the scene and then utilizes the 3D Arnold transformation to permutate the 1D light signal of single-pixel detection.Then the transformation parameters serve as the secret key,while the security of key exchange is guaranteed by an elliptic curve-based key exchange mechanism.Compared with existing encryption schemes,both computer simulations and optical experiments have been conducted to demonstrate that the proposed technique not only enhances the security of encryption but also eliminates the need for complicated pattern scrambling rules.Additionally,this approach solves the problem of secure key transmission,thus ensuring the security of information and the quality of the decrypted images.展开更多
提出了一种Transformer与图网络相结合的网络模型,用于对视觉传感器采集到的视频图像进行三维人体姿态估计。Transformer能够有效地从二维关键关节点中提取时空维度高相关性特征,而图网络则能够感知细节相关性特征,通过融合这两种网络结...提出了一种Transformer与图网络相结合的网络模型,用于对视觉传感器采集到的视频图像进行三维人体姿态估计。Transformer能够有效地从二维关键关节点中提取时空维度高相关性特征,而图网络则能够感知细节相关性特征,通过融合这两种网络结构,提高了三维姿态估计的精度。在公开数据集Human3.6M上进行了仿真实验,验证了Transformer与图卷积融合算法的性能。实验结果显示,最终估计得到的三维人体关节点的平均关节点位置偏差(Mean Per Joint Position Error,MPJPE)为38.4 mm,相较于现有方法有一定提升,表明该方法具有较强的应用价值,可应用于许多下游相关工作中。展开更多
在计算机视觉和机器学习领域的快速发展中,3D人体姿态估计已成为一项备受关注的研究方向。早期的3D人体姿态估计方法多集中在图像领域,然而这类方法需要更多的计算资源且结果并不理想。为了克服这些问题,2D-to-3D方法应运而生。目前效...在计算机视觉和机器学习领域的快速发展中,3D人体姿态估计已成为一项备受关注的研究方向。早期的3D人体姿态估计方法多集中在图像领域,然而这类方法需要更多的计算资源且结果并不理想。为了克服这些问题,2D-to-3D方法应运而生。目前效果最佳的2D-to-3D方法多基于Transformer,然而这类方法着重于对人体骨架的全局提取,忽略了骨架的局部差异性,导致对局部信息学习不够充分。本文提出一种基于Transformer框架的三维人体姿态估计算法,该算法在全局算法的基础上添加一个局部分支网络。在局部分支中,首先通过非均匀图卷积网络提取二维人体骨架中的空间语义特征,使网络更好地学习人体的拓扑结构关系。其次,通过分层局部时间网络从人体关节、部位及姿势这3个不同层级学习帧与帧之间的细微差异。在全局算法中,输入数据经过空间和时间Transformer分别提取所有关键点和所有帧的分布关系。该网络在低层部分由局部算法与全局算法并联提取骨架特征,高层部分则由全局算法级联组成。本文在Human3.6M和MPI-INF-3DHP两个公共数据集上使用MPJPE(Mean Per Joint Position Error)评价指标对该方法进行评估,分别取得20.8 mm及22.3 mm的结果。结果表明,本文算法已达到相对较高的性能水准。展开更多
基金supported by the National Natural Science Foundation of China(6157206361401308)+6 种基金the Fundamental Research Funds for the Central Universities(2016YJS039)the Natural Science Foundation of Hebei Province(F2016201142F2016201187)the Natural Social Foundation of Hebei Province(HB15TQ015)the Science Research Project of Hebei Province(QN2016085ZC2016040)the Natural Science Foundation of Hebei University(2014-303)
文摘Fusion methods based on multi-scale transforms have become the mainstream of the pixel-level image fusion. However,most of these methods cannot fully exploit spatial domain information of source images, which lead to the degradation of image.This paper presents a fusion framework based on block-matching and 3D(BM3D) multi-scale transform. The algorithm first divides the image into different blocks and groups these 2D image blocks into 3D arrays by their similarity. Then it uses a 3D transform which consists of a 2D multi-scale and a 1D transform to transfer the arrays into transform coefficients, and then the obtained low-and high-coefficients are fused by different fusion rules. The final fused image is obtained from a series of fused 3D image block groups after the inverse transform by using an aggregation process. In the experimental part, we comparatively analyze some existing algorithms and the using of different transforms, e.g. non-subsampled Contourlet transform(NSCT), non-subsampled Shearlet transform(NSST), in the 3D transform step. Experimental results show that the proposed fusion framework can not only improve subjective visual effect, but also obtain better objective evaluation criteria than state-of-the-art methods.
文摘当前,步态识别的主流方法常依赖堆叠卷积层来逐步扩大感受野,以融合局部特征,这种方法大多采用浅层网络,在提取步态图像的全局特征时存在一定的局限性,并缺乏对时序周期特征信息的关注。因此提出一种融合Transformer和3D卷积的深层神经网络算法(3D convolutional gait recognition network based on adaptFormer and spect-conv,3D-ASgaitNet)。首先,初始残差卷积层将二进制轮廓数据转换为浮点编码特征图,以提供密集的低级结构特征;在此基础上,光谱层通过频域和时域的联合处理增强特征提取能力,并使用伪3D残差卷积模块进一步提取高级时空特征;最后融合AdaptFormer模块,通过轻量级的下采样-上采样网络结构,以适应不同的数据分布和任务需求,提供灵活的特征变换能力。3D-ASgaitNet分别在4个公开的室内数据集(CASIA-B、OU-MVLP)、室外数据集(GREW、Gait3D)上进行,分别取得99.84%、87.83%、45.32%、72.12%的识别准确率。实验结果表明,所提出方法在CASIA-B、Gait3D数据集中的识别准确率接近SOTA性能。
文摘【背景】传统方法因静态感受野设计较难适配城市自动驾驶场景中汽车、行人及骑行者等目标的显著尺度差异,且跨尺度特征融合易引发层级干扰。【方法】针对自动驾驶场景中多类别、多尺寸目标的3D检测中跨尺度表征一致性的关键挑战,本研究提出基于均衡化感受野的3D目标检测方法VoxTNT,通过局部-全局协同注意力机制提升检测性能。在局部层面,设计了PointSetFormer模块,引入诱导集注意力模块(Induced Set Attention Block,ISAB),通过约简的交叉注意力聚合高密度点云的细粒度几何特征,突破传统体素均值池化的信息损失瓶颈;在全局层面,设计了VoxelFormerFFN模块,将非空体素抽象为超点集并实施跨体素ISAB交互,建立长程上下文依赖关系,并将全局特征学习计算负载从O(N^(2))压缩至O(M^(2))(M<<N,M为非空体素数量),规避了复杂的Transformer直接使用在原始点云造成的高计算复杂度。该双域耦合架构实现了局部细粒度感知与全局语义关联的动态平衡,有效缓解固定感受野和多尺度融合导致的特征建模偏差。【结果】实验表明,该方法在KITTI数据集单阶段检测下,中等难度级别的行人检测精度AP(Average Precision)值达到59.56%,较SECOND基线提高约12.4%,两阶段检测下以66.54%的综合指标mAP(mean Average Precision)领先次优方法BSAODet的66.10%。同时,在WOD数据集中验证了方法的有效性,综合指标mAP达到66.09%分别超越SECOND和PointPillars基线7.7%和8.5%。消融实验进一步表明,均衡化局部和全局感受野的3D特征学习机制能显著提升小目标检测精度(如在KITTI数据集中全组件消融的情况下,中等难度级别的行人和骑行者检测精度分别下降10.8%和10.0%),同时保持大目标检测的稳定性。【结论】本研究为解决自动驾驶多尺度目标检测难题提供了新思路,未来将优化模型结构以进一步提升效能。
基金Project supported by the National Natural Science Foundation of China(Grant No.62075241).
文摘Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public channels will easily draw the attention of eavesdroppers.Here,we introduce an efficient encryption method for SPI data transmission that uses the 3D Arnold transformation to directly disrupt 1D single-pixel light signals and utilizes the elliptic curve encryption algorithm for key transmission.This encryption scheme immediately employs Hadamard patterns to illuminate the scene and then utilizes the 3D Arnold transformation to permutate the 1D light signal of single-pixel detection.Then the transformation parameters serve as the secret key,while the security of key exchange is guaranteed by an elliptic curve-based key exchange mechanism.Compared with existing encryption schemes,both computer simulations and optical experiments have been conducted to demonstrate that the proposed technique not only enhances the security of encryption but also eliminates the need for complicated pattern scrambling rules.Additionally,this approach solves the problem of secure key transmission,thus ensuring the security of information and the quality of the decrypted images.
文摘提出了一种Transformer与图网络相结合的网络模型,用于对视觉传感器采集到的视频图像进行三维人体姿态估计。Transformer能够有效地从二维关键关节点中提取时空维度高相关性特征,而图网络则能够感知细节相关性特征,通过融合这两种网络结构,提高了三维姿态估计的精度。在公开数据集Human3.6M上进行了仿真实验,验证了Transformer与图卷积融合算法的性能。实验结果显示,最终估计得到的三维人体关节点的平均关节点位置偏差(Mean Per Joint Position Error,MPJPE)为38.4 mm,相较于现有方法有一定提升,表明该方法具有较强的应用价值,可应用于许多下游相关工作中。
文摘在计算机视觉和机器学习领域的快速发展中,3D人体姿态估计已成为一项备受关注的研究方向。早期的3D人体姿态估计方法多集中在图像领域,然而这类方法需要更多的计算资源且结果并不理想。为了克服这些问题,2D-to-3D方法应运而生。目前效果最佳的2D-to-3D方法多基于Transformer,然而这类方法着重于对人体骨架的全局提取,忽略了骨架的局部差异性,导致对局部信息学习不够充分。本文提出一种基于Transformer框架的三维人体姿态估计算法,该算法在全局算法的基础上添加一个局部分支网络。在局部分支中,首先通过非均匀图卷积网络提取二维人体骨架中的空间语义特征,使网络更好地学习人体的拓扑结构关系。其次,通过分层局部时间网络从人体关节、部位及姿势这3个不同层级学习帧与帧之间的细微差异。在全局算法中,输入数据经过空间和时间Transformer分别提取所有关键点和所有帧的分布关系。该网络在低层部分由局部算法与全局算法并联提取骨架特征,高层部分则由全局算法级联组成。本文在Human3.6M和MPI-INF-3DHP两个公共数据集上使用MPJPE(Mean Per Joint Position Error)评价指标对该方法进行评估,分别取得20.8 mm及22.3 mm的结果。结果表明,本文算法已达到相对较高的性能水准。