A novel unsupervised ship detection and extraction method is proposed. A combination model based on visual saliency is constructed for searching the ship target regions and suppressing the false alarms. The salient ta...A novel unsupervised ship detection and extraction method is proposed. A combination model based on visual saliency is constructed for searching the ship target regions and suppressing the false alarms. The salient target regions are extracted and marked through segmentation. Radon transform is applied to confirm the suspected ship targets with symmetry profiles. Then, a new descriptor, improved histogram of oriented gradient(HOG), is introduced to discriminate the real ships. The experimental results on real optical remote sensing images demonstrate that plenty of ships can be extracted and located successfully, and the number of ships can be accurately acquired. Furthermore, the proposed method is superior to the contrastive methods in terms of both accuracy rate and false alarm rate.展开更多
针对基于点线特征的实时定位与建图(simultaneous localization and mapping,SLAM)算法在位姿识别过程中对定位精度的要求,提出一种改进单目视觉惯性同步定位与建图(monocular visual-inertial SLAM with efficient point-line flow fea...针对基于点线特征的实时定位与建图(simultaneous localization and mapping,SLAM)算法在位姿识别过程中对定位精度的要求,提出一种改进单目视觉惯性同步定位与建图(monocular visual-inertial SLAM with efficient point-line flow features,EPLF-VINS)算法。首先,分析了梯度阈值参数对line segment detection by edge drawing(EDLines)线段提取算法的影响;其次,在点特征正向光流追踪后采用逆向光流追踪剔除错误追踪点,提高光流追踪正确率;然后,在EPLF-VINS算法的线段提取处融合一种自适应调节算法,通过计算逆向光流追踪后的点特征光流追踪成功率实时地调节梯度阈值参数,从而实现根据环境的变化动态调整线段提取,更好地平衡计算成本与定位精度的效果;最后,基于Robot Operating System(ROS)平台分析了改进EPLF-VINS算法与对比算法在EuRoc和TUM-VI数据集上的轨迹精度与效率。研究结果表明,改进EPLF-VINS算法绘制的轨迹曲线更加贴合真实轨迹,在保证实时性的同时具有更高的定位精度。展开更多
Induced polarization (IP) 3D tomography with the similar central gradient array combines IP sounding and IP profiling to retrieve 3D resistivity and polarization data rapidly. The method is characterized by high spa...Induced polarization (IP) 3D tomography with the similar central gradient array combines IP sounding and IP profiling to retrieve 3D resistivity and polarization data rapidly. The method is characterized by high spatial resolution and large probing depth. We discuss data acquisition and 3D IP imaging procedures using the central gradient array with variable electrode distances. A 3D geoelectric model was constructed and then numerically modeled. The data modeling results suggest that this method can capture the features of real geoelectric models. The method was applied to a polymetallic mine in Gansu Province. The results suggest that IP 3D tomography captures the distribution of resistivity and polarization of subsurface media, delineating the extension of abrupt interfaces, and identifies mineralization.展开更多
Due to the scattered nature of the network,data transmission in a dis-tributed Mobile Ad-hoc Network(MANET)consumes more energy resources(ER)than in a centralized network,resulting in a shorter network lifespan(NL).As...Due to the scattered nature of the network,data transmission in a dis-tributed Mobile Ad-hoc Network(MANET)consumes more energy resources(ER)than in a centralized network,resulting in a shorter network lifespan(NL).As a result,we build an Enhanced Opportunistic Routing(EORP)protocol architecture in order to address the issues raised before.This proposed routing protocol goal is to manage the routing cost by employing power,load,and delay to manage the routing energy consumption based on theflooding of control pack-ets from the target node.According to the goal of the proposed protocol techni-que,it is possible to manage the routing cost by applying power,load,and delay.The proposed technique also manage the routing energy consumption based on theflooding of control packets from the destination node in order to reduce the routing cost.Control packet exchange between the target and all the nodes,on the other hand,is capable of having an influence on the overall efficiency of the system.The EORP protocol and the Multi-channel Cooperative Neighbour Discovery(MCCND)protocol have been designed to detect the cooperative adja-cent nodes for each node in the routing route as part of the routing path discovery process,which occurs during control packet transmission.While control packet transmission is taking place during the routing path discovery process,the EORP protocol and the Multi-channel Cooperative Neighbour Discovery(MCCND)protocol have been designed to detect the cooperative adjacent nodes for each node in the routing.Also included is a simulation of these protocols in order to evaluate their performance across a wide range of packet speeds using Constant Bit Rate(CBR).When the packet rate of the CBR is 20 packets per second,the results reveal that the EORP-MCCND is 0.6 s quicker than the state-of-the-art protocols,according to thefindings.Assuming that the CBR packet rate is 20 packets per second,the EORP-MCCND achieves 0.6 s of End 2 End Delay,0.05 s of Routing Overhead Delay,120 s of Network Lifetime,and 20 J of Energy Consumption efficiency,which is much better than that of the state-of-the-art protocols.展开更多
针对音视频多模态学习中因异质学习速率导致单一模态主导模型学习过程,抑制其他模态学习,进而削弱多模态协同决策效果的问题,提出一种基于自适应梯度调制的多模态平衡学习方法(adaptive gradient modulation based compensation and reg...针对音视频多模态学习中因异质学习速率导致单一模态主导模型学习过程,抑制其他模态学习,进而削弱多模态协同决策效果的问题,提出一种基于自适应梯度调制的多模态平衡学习方法(adaptive gradient modulation based compensation and regularization,AGM-CR)。首先,根据模态间的学习梯度差异引入调制系数来自适应调整各模态的学习速率;然后,通过梯度均衡化策略,将单个模态的梯度损失作为正则项融入总损失来约束模态间梯度差异,进一步平衡各模态的学习过程;最后,实验结果表明在CREMA-D和RAVDESS数据集上,AGM-CR将分类准确率分别提高了2.5和3.3百分点,并在多次迭代中减小模型的梯度波动,表现出更高的训练稳定性和收敛速度。与现有的平衡方法相比,AGM-CR可即插即用,更具灵活性和通用性。展开更多
This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o...This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.展开更多
Although ray tracing produces high-fidelity, realistic images, it is considered computationally burdensome when implemented on a high rendering rate system. Perception-driven rendering techniques generate images with ...Although ray tracing produces high-fidelity, realistic images, it is considered computationally burdensome when implemented on a high rendering rate system. Perception-driven rendering techniques generate images with minimal noise and distortion that are generally acceptable to the human visual system, thereby reducing rendering costs. In this paper, we introduce a perception-entropy-driven temporal reusing method to accelerate real-time ray tracing. We first build a just noticeable difference(JND) model to represent the uncertainty of ray samples and image space masking effects. Then, we expand the shading gradient through gradient max-pooling and gradient filtering to enlarge the visual receipt field. Finally, we dynamically optimize reusable time segments to improve the accuracy of temporal reusing. Compared with Monte Carlo ray tracing, our algorithm enhances frames per second(fps) by 1.93× to 2.96× at 8 to 16 samples per pixel, significantly accelerating the Monte Carlo ray tracing process while maintaining visual quality.展开更多
基金supported by the National Natural Science Foundation of China(No.60902067)the Key Programs for Science and Technology Development of Jilin Province of China(No.11ZDGG001)
文摘A novel unsupervised ship detection and extraction method is proposed. A combination model based on visual saliency is constructed for searching the ship target regions and suppressing the false alarms. The salient target regions are extracted and marked through segmentation. Radon transform is applied to confirm the suspected ship targets with symmetry profiles. Then, a new descriptor, improved histogram of oriented gradient(HOG), is introduced to discriminate the real ships. The experimental results on real optical remote sensing images demonstrate that plenty of ships can be extracted and located successfully, and the number of ships can be accurately acquired. Furthermore, the proposed method is superior to the contrastive methods in terms of both accuracy rate and false alarm rate.
文摘针对基于点线特征的实时定位与建图(simultaneous localization and mapping,SLAM)算法在位姿识别过程中对定位精度的要求,提出一种改进单目视觉惯性同步定位与建图(monocular visual-inertial SLAM with efficient point-line flow features,EPLF-VINS)算法。首先,分析了梯度阈值参数对line segment detection by edge drawing(EDLines)线段提取算法的影响;其次,在点特征正向光流追踪后采用逆向光流追踪剔除错误追踪点,提高光流追踪正确率;然后,在EPLF-VINS算法的线段提取处融合一种自适应调节算法,通过计算逆向光流追踪后的点特征光流追踪成功率实时地调节梯度阈值参数,从而实现根据环境的变化动态调整线段提取,更好地平衡计算成本与定位精度的效果;最后,基于Robot Operating System(ROS)平台分析了改进EPLF-VINS算法与对比算法在EuRoc和TUM-VI数据集上的轨迹精度与效率。研究结果表明,改进EPLF-VINS算法绘制的轨迹曲线更加贴合真实轨迹,在保证实时性的同时具有更高的定位精度。
基金funded jointly by the National High Technology Research and Development Program(863 Program:No.2014AA06A610)special funds for basic scientific research business expenses of the Chinese Academy of Geological Sciences(No.YYWF201632)the National Major Scientific Instruments and Equipment Development Projects(No.2011YQ050060)
文摘Induced polarization (IP) 3D tomography with the similar central gradient array combines IP sounding and IP profiling to retrieve 3D resistivity and polarization data rapidly. The method is characterized by high spatial resolution and large probing depth. We discuss data acquisition and 3D IP imaging procedures using the central gradient array with variable electrode distances. A 3D geoelectric model was constructed and then numerically modeled. The data modeling results suggest that this method can capture the features of real geoelectric models. The method was applied to a polymetallic mine in Gansu Province. The results suggest that IP 3D tomography captures the distribution of resistivity and polarization of subsurface media, delineating the extension of abrupt interfaces, and identifies mineralization.
文摘Due to the scattered nature of the network,data transmission in a dis-tributed Mobile Ad-hoc Network(MANET)consumes more energy resources(ER)than in a centralized network,resulting in a shorter network lifespan(NL).As a result,we build an Enhanced Opportunistic Routing(EORP)protocol architecture in order to address the issues raised before.This proposed routing protocol goal is to manage the routing cost by employing power,load,and delay to manage the routing energy consumption based on theflooding of control pack-ets from the target node.According to the goal of the proposed protocol techni-que,it is possible to manage the routing cost by applying power,load,and delay.The proposed technique also manage the routing energy consumption based on theflooding of control packets from the destination node in order to reduce the routing cost.Control packet exchange between the target and all the nodes,on the other hand,is capable of having an influence on the overall efficiency of the system.The EORP protocol and the Multi-channel Cooperative Neighbour Discovery(MCCND)protocol have been designed to detect the cooperative adja-cent nodes for each node in the routing route as part of the routing path discovery process,which occurs during control packet transmission.While control packet transmission is taking place during the routing path discovery process,the EORP protocol and the Multi-channel Cooperative Neighbour Discovery(MCCND)protocol have been designed to detect the cooperative adjacent nodes for each node in the routing.Also included is a simulation of these protocols in order to evaluate their performance across a wide range of packet speeds using Constant Bit Rate(CBR).When the packet rate of the CBR is 20 packets per second,the results reveal that the EORP-MCCND is 0.6 s quicker than the state-of-the-art protocols,according to thefindings.Assuming that the CBR packet rate is 20 packets per second,the EORP-MCCND achieves 0.6 s of End 2 End Delay,0.05 s of Routing Overhead Delay,120 s of Network Lifetime,and 20 J of Energy Consumption efficiency,which is much better than that of the state-of-the-art protocols.
文摘针对音视频多模态学习中因异质学习速率导致单一模态主导模型学习过程,抑制其他模态学习,进而削弱多模态协同决策效果的问题,提出一种基于自适应梯度调制的多模态平衡学习方法(adaptive gradient modulation based compensation and regularization,AGM-CR)。首先,根据模态间的学习梯度差异引入调制系数来自适应调整各模态的学习速率;然后,通过梯度均衡化策略,将单个模态的梯度损失作为正则项融入总损失来约束模态间梯度差异,进一步平衡各模态的学习过程;最后,实验结果表明在CREMA-D和RAVDESS数据集上,AGM-CR将分类准确率分别提高了2.5和3.3百分点,并在多次迭代中减小模型的梯度波动,表现出更高的训练稳定性和收敛速度。与现有的平衡方法相比,AGM-CR可即插即用,更具灵活性和通用性。
基金funded by Woosong University Academic Research 2024.
文摘This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.
基金supported by the National Natural Science Foundation of China (No.U19A2063)the Jilin Provincial Science&Technology Development Program of China (No.20230201080GX)。
文摘Although ray tracing produces high-fidelity, realistic images, it is considered computationally burdensome when implemented on a high rendering rate system. Perception-driven rendering techniques generate images with minimal noise and distortion that are generally acceptable to the human visual system, thereby reducing rendering costs. In this paper, we introduce a perception-entropy-driven temporal reusing method to accelerate real-time ray tracing. We first build a just noticeable difference(JND) model to represent the uncertainty of ray samples and image space masking effects. Then, we expand the shading gradient through gradient max-pooling and gradient filtering to enlarge the visual receipt field. Finally, we dynamically optimize reusable time segments to improve the accuracy of temporal reusing. Compared with Monte Carlo ray tracing, our algorithm enhances frames per second(fps) by 1.93× to 2.96× at 8 to 16 samples per pixel, significantly accelerating the Monte Carlo ray tracing process while maintaining visual quality.