Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recentl...Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recently,many deep learning based methods have been proposed to predict RUL.Among these methods,recurrent neural network(RNN)based approaches show a strong capability of capturing sequential information.This allows RNN based methods to perform better than convolutional neural network(CNN)based approaches on the RUL prediction task.In this paper,we question this common paradigm and argue that existing CNN based approaches are not designed according to the classic principles of CNN,which reduces their performances.Additionally,the capacity of capturing sequential information is highly affected by the receptive field of CNN,which is neglected by existing CNN based methods.To solve these problems,we propose a series of new CNNs,which show competitive results to RNN based methods.Compared with RNN,CNN processes the input signals in parallel so that the temporal sequence is not easily determined.To alleviate this issue,a position encoding scheme is developed to enhance the sequential information encoded by a CNN.Hence,our proposed position encoding based CNN called PE-Net is further improved and even performs better than RNN based methods.Extensive experiments are conducted on the C-MAPSS dataset,where our PE-Net shows state-of-the-art performance.展开更多
In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we disc...In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we discuss how to design the encoded sign and propose a robust decoding algorithm based on contour.Secondly,the Adaboost algorithm is used to train a classifier which can detect the encoded sign from image.Lastly,the position of UAV can be calculated by using the projective relation between the object points and their corresponding image points.Experiment includes two parts.First,simulated video data is used to verify the feasibility of the proposed method,and the results show that the average absolute error in each direction is below 0.02 m.Second,a video,acquired from an actual UAV flight,is used to calculate the position of UAV.The results show that the calculated trajectory is consistent with the actual flight path.The method runs at a speed of 0.153 sper frame.展开更多
Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the vary...Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the varying importance of each modality across different contexts,a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process.In response to these critical limitations,we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues.In our model,text is treated as the core modality,while speech and video features are selectively incorporated through a unique position-aware fusion process.The spatial position encoding strategy preserves the internal structural information of speech and visual modalities,enabling the model to capture localized intra-modal dependencies that are often overlooked.This design enhances the richness and discriminative power of the fused representation,enabling more accurate and context-aware sentiment prediction.Finally,we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model.The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks.展开更多
Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position...Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position-Class Awareness Transformer(PCAT)network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes.In our proposal,we construct our PCAT network by proposing a novel Grid Mapping Position Encoding(GMPE)method and refining the encoder-decoder framework.First,GMPE includes mapping the regions of objects to grids,calculating the relative distance among objects and quantization.Meanwhile,we also improve the Self-attention to adapt the GMPE.Then,we propose a Classes Semantic Quantization strategy to extract semantic information from the object classes,which is employed to facilitate embedding features and refining the encoder-decoder framework.To capture the interaction between multi-modal features,we propose Object Classes Awareness(OCA)to refine the encoder and decoder,namely OCAE and OCAD,respectively.Finally,we apply GMPE,OCAE and OCAD to form various combinations and to complete the entire PCAT.We utilize the MSCOCO dataset to evaluate the performance of our method.The results demonstrate that PCAT outperforms the other competitive methods.展开更多
篇章要素识别(discourse element identification)的主要任务是识别篇章要素单元并进行分类.针对篇章要素识别对上下文依赖性理解不足的问题,提出一种基于BiLSTM-Attention的识别篇章要素模型,提高议论文篇章要素识别的准确率.该模型利...篇章要素识别(discourse element identification)的主要任务是识别篇章要素单元并进行分类.针对篇章要素识别对上下文依赖性理解不足的问题,提出一种基于BiLSTM-Attention的识别篇章要素模型,提高议论文篇章要素识别的准确率.该模型利用句子结构和位置编码来识别句子的成分关系,通过双向长短期记忆网络(bidirectional long short-term memory,BiLSTM)进一步获得深层次上下文相关联的信息;引入注意力机制(attention mechanism)优化模型特征向量,提高文本分类的准确度;最终用句间多头自注意力(multi-head self-attention)获取句子在内容和结构上的关系,弥补距离较远的句子依赖问题.相比于HBiLSTM、BERT等基线模型,在相同参数、相同实验条件下,中文数据集和英文数据集上准确率分别提升1.3%、3.6%,验证了该模型在篇章要素识别任务中的有效性.展开更多
要:在车道线检测研究中,现有算法能够高效地检测在良好光照条件下的车道线,但在暗光下进行车道线检测仍然面临较高的漏检率挑战。针对此问题,利用车道线间的结构关系,提出了一种有助于暗光条件下的检测算法——实例关联网络(instance as...要:在车道线检测研究中,现有算法能够高效地检测在良好光照条件下的车道线,但在暗光下进行车道线检测仍然面临较高的漏检率挑战。针对此问题,利用车道线间的结构关系,提出了一种有助于暗光条件下的检测算法——实例关联网络(instance association net,IANet)。利用车道线起点处的特征以及全局特征图为不同的车道线生成独特的掩膜,将掩膜作用于特征图以实现车道线的实例级特征分离;基于实例级注意力机制来关联分离后的特征,该机制能够在实例之间进行有效的信息交互,同时在关联之前引入绝对位置编码,增强模型对车道线位置关联性的关注;通过定位车道线上的关键点和计算偏移量来实现车道线的精确检测。IANet在CULane数据集上与现有方法进行了实验对比,总体评分以及夜间场景下的评分分别为75.7%和71.9%,相比于其他算法明显提高,在多种受光照影响的环境下展现出了良好的鲁棒性,所提出的实例特征关联显著降低了暗光下车道线检测的漏检率。展开更多
To solve the problem of identification and measurement of two projectiles hitting the target at the same time,this paper proposes a projectile coordinate test method combining three photoelectric encoder detection scr...To solve the problem of identification and measurement of two projectiles hitting the target at the same time,this paper proposes a projectile coordinate test method combining three photoelectric encoder detection screens,and establishes a coordinate calculation model for two projectiles to reach the same detection screen at the same time.The design method of three photoelectric encoder detection screens and the position coordinate recognition algorithm of the blocked array photoelectric detector when projectile passing through the photoelectric encoder detection screen are studied.Using the screen projection method,the intersected linear equation of the projectile and the line laser with the main detection screen as the core coordinate plane is established,and the projectile coordinate data set formed by any two photoelectric encoder detection screens is constructed.The principle of minimum error of coordinate data set is used to determine the coordinates of two projectiles hitting the target at the same time.The rationality and feasibility of the proposed test method are verified by experiments and comparative tests.展开更多
行人重识别虽已取得了显著进展,但在实际应用场景中,不同障碍物引起的遮挡问题仍然是一个亟待解决的挑战。为了从被遮挡行人中提取更有效的特征,提出了一种基于可学习掩模和位置编码(Learnable mask and position encoding, LMPE)的遮...行人重识别虽已取得了显著进展,但在实际应用场景中,不同障碍物引起的遮挡问题仍然是一个亟待解决的挑战。为了从被遮挡行人中提取更有效的特征,提出了一种基于可学习掩模和位置编码(Learnable mask and position encoding, LMPE)的遮挡行人重识别方法。首先,引入了一种可学习的双路注意力掩模生成器(Learnable dual attention mask generator, LDAMG),生成的掩模能够适应不同遮挡模式,显著提升了对被遮挡行人的识别准确性。该模块可以使网络更灵活,能更好地适应多样性的遮挡情况,有效克服了遮挡带来的困扰。同时,该网络通过掩模学习上下文信息,进一步增强了对行人所处场景的理解力。此外,为了解决Transformer位置信息损耗问题,引入了遮挡感知位置编码融合(Occlusion aware position encoding fusion, OAPEF)模块。该模块进行不同层次位置编码融合,使网络获得更强的表达能力。通过全方位整合图像位置编码,可以更准确地理解行人间的空间关系,提高模型对遮挡情况的适应能力。最后,仿真实验表明,本文提出的LMPE在Occluded-Duke和Occluded-ReID遮挡数据集以及Market-1501和DukeMTMC-ReID无遮挡数据集上都取得了较好的效果,验证了本文方法的有效性和优越性。展开更多
语篇要素识别在自动作文评分中发挥着重要作用,提高语篇要素识别的准确率有助于增强自动作文评分的效果以及可解释性。然而,语篇要素识别任务面临着上下文依赖和句子歧义性等挑战。传统的基于规则和特征工程的方法难以捕捉文本中复杂的...语篇要素识别在自动作文评分中发挥着重要作用,提高语篇要素识别的准确率有助于增强自动作文评分的效果以及可解释性。然而,语篇要素识别任务面临着上下文依赖和句子歧义性等挑战。传统的基于规则和特征工程的方法难以捕捉文本中复杂的语义信息和长距离依赖关系,而深度学习方法虽然能够自动学习文本特征,但仍然存在对关键位置信息利用不足的问题。针对上述问题,提出了一种混合位置编码的语篇要素识别模型,即HPE-BiLSTM(Hybrid Position Encoding Bidirectional Long Short-Term Memory)。该模型首先基于预训练的词向量获取句子表示,然后通过双向长短期记忆网络提取句子级特征。在句子级特征的基础上,采用混合的位置编码方案以确保关键位置信息的有效传递。最后,使用线性层和激活函数实现语篇要素识别。该模型在议论文数据集进行实验,并与Feature-based、BERT、BiLSTM、DiSA和DCRGNN五个模型进行比较。实验结果表明,HPE-BiLSTM模型的准确率达到了0.693,在语篇要素识别方面的F 1分数为0.684,优于其他模型。展开更多
针对印刷电路板(printed circuit board,PCB)缺陷检测过程中,因包含丰富的小目标缺陷,易出现漏检、误检现象,提出一种基于改进增强金字塔实时检测变换器(enhance pyramid real time detection transformer,EP-RTDETR)小目标PCB表面缺陷...针对印刷电路板(printed circuit board,PCB)缺陷检测过程中,因包含丰富的小目标缺陷,易出现漏检、误检现象,提出一种基于改进增强金字塔实时检测变换器(enhance pyramid real time detection transformer,EP-RTDETR)小目标PCB表面缺陷检测算法。首先,使用CSPDarknet替代原始骨干网络,以增强模型的特征提取能力;其次,设空间移动卷积门控线性单元(spatial moving point convolutional gated linear unit,SMPCGLU)模块改造C2f中的Bottleneck,增强了特征的门控调制能力和空间自适应性;再次,引入可学习位置编码,改进尺度交互机制,增强对不同位置信息的响应能力;然后,基于跨尺度特征融合模块(cross-scale feature-fusion module,CCFM)模块设计小目标增强金字塔结构(small object enhance pyramid,SOEP),增强的特征层和精细的特征融合使模型能够更准确地定位和识别小目标;最后,设计MPDIoU(minimum point distance-based intersection over union)+NWD(normalized wasserstein distance)loss,在加快模型收敛速度的同时更加关注小目标缺陷,回归结果更加准确。试验结果表明,相较于基准模型,准确率P提高了4.6%,召回率R提高了5.1%,平均精度均值mAP50提高了4.6%,参数量减少了16.38 M,浮点数减少了48.3,FPS提高了8.51 s,能够更好地进行小目标PCB表面缺陷检测。展开更多
基金supported by National Research Foundation of Singapore,AME Young Individual Research Grant(A2084c0167)。
文摘Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recently,many deep learning based methods have been proposed to predict RUL.Among these methods,recurrent neural network(RNN)based approaches show a strong capability of capturing sequential information.This allows RNN based methods to perform better than convolutional neural network(CNN)based approaches on the RUL prediction task.In this paper,we question this common paradigm and argue that existing CNN based approaches are not designed according to the classic principles of CNN,which reduces their performances.Additionally,the capacity of capturing sequential information is highly affected by the receptive field of CNN,which is neglected by existing CNN based methods.To solve these problems,we propose a series of new CNNs,which show competitive results to RNN based methods.Compared with RNN,CNN processes the input signals in parallel so that the temporal sequence is not easily determined.To alleviate this issue,a position encoding scheme is developed to enhance the sequential information encoded by a CNN.Hence,our proposed position encoding based CNN called PE-Net is further improved and even performs better than RNN based methods.Extensive experiments are conducted on the C-MAPSS dataset,where our PE-Net shows state-of-the-art performance.
基金supported by the National Key Research Projects(No.2016YFB0501403)the National Demonstration Center for Experimental Remote Sensing&Information Engineering(Wuhan University)
文摘In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we discuss how to design the encoded sign and propose a robust decoding algorithm based on contour.Secondly,the Adaboost algorithm is used to train a classifier which can detect the encoded sign from image.Lastly,the position of UAV can be calculated by using the projective relation between the object points and their corresponding image points.Experiment includes two parts.First,simulated video data is used to verify the feasibility of the proposed method,and the results show that the average absolute error in each direction is below 0.02 m.Second,a video,acquired from an actual UAV flight,is used to calculate the position of UAV.The results show that the calculated trajectory is consistent with the actual flight path.The method runs at a speed of 0.153 sper frame.
基金supported by the Collaborative Tackling Project of the Yangtze River Delta SciTech Innovation Community(Nos.2024CSJGG01503,2024CSJGG01500)Guangxi Key Research and Development Program(No.AB24010317)Jiangxi Provincial Key Laboratory of Electronic Data Control and Forensics(Jiangxi Police College)(No.2025JXJYKFJJ002).
文摘Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the varying importance of each modality across different contexts,a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process.In response to these critical limitations,we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues.In our model,text is treated as the core modality,while speech and video features are selectively incorporated through a unique position-aware fusion process.The spatial position encoding strategy preserves the internal structural information of speech and visual modalities,enabling the model to capture localized intra-modal dependencies that are often overlooked.This design enhances the richness and discriminative power of the fused representation,enabling more accurate and context-aware sentiment prediction.Finally,we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model.The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks.
基金supported by the National Key Research and Development Program of China[No.2021YFB2206200].
文摘Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position-Class Awareness Transformer(PCAT)network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes.In our proposal,we construct our PCAT network by proposing a novel Grid Mapping Position Encoding(GMPE)method and refining the encoder-decoder framework.First,GMPE includes mapping the regions of objects to grids,calculating the relative distance among objects and quantization.Meanwhile,we also improve the Self-attention to adapt the GMPE.Then,we propose a Classes Semantic Quantization strategy to extract semantic information from the object classes,which is employed to facilitate embedding features and refining the encoder-decoder framework.To capture the interaction between multi-modal features,we propose Object Classes Awareness(OCA)to refine the encoder and decoder,namely OCAE and OCAD,respectively.Finally,we apply GMPE,OCAE and OCAD to form various combinations and to complete the entire PCAT.We utilize the MSCOCO dataset to evaluate the performance of our method.The results demonstrate that PCAT outperforms the other competitive methods.
文摘篇章要素识别(discourse element identification)的主要任务是识别篇章要素单元并进行分类.针对篇章要素识别对上下文依赖性理解不足的问题,提出一种基于BiLSTM-Attention的识别篇章要素模型,提高议论文篇章要素识别的准确率.该模型利用句子结构和位置编码来识别句子的成分关系,通过双向长短期记忆网络(bidirectional long short-term memory,BiLSTM)进一步获得深层次上下文相关联的信息;引入注意力机制(attention mechanism)优化模型特征向量,提高文本分类的准确度;最终用句间多头自注意力(multi-head self-attention)获取句子在内容和结构上的关系,弥补距离较远的句子依赖问题.相比于HBiLSTM、BERT等基线模型,在相同参数、相同实验条件下,中文数据集和英文数据集上准确率分别提升1.3%、3.6%,验证了该模型在篇章要素识别任务中的有效性.
文摘要:在车道线检测研究中,现有算法能够高效地检测在良好光照条件下的车道线,但在暗光下进行车道线检测仍然面临较高的漏检率挑战。针对此问题,利用车道线间的结构关系,提出了一种有助于暗光条件下的检测算法——实例关联网络(instance association net,IANet)。利用车道线起点处的特征以及全局特征图为不同的车道线生成独特的掩膜,将掩膜作用于特征图以实现车道线的实例级特征分离;基于实例级注意力机制来关联分离后的特征,该机制能够在实例之间进行有效的信息交互,同时在关联之前引入绝对位置编码,增强模型对车道线位置关联性的关注;通过定位车道线上的关键点和计算偏移量来实现车道线的精确检测。IANet在CULane数据集上与现有方法进行了实验对比,总体评分以及夜间场景下的评分分别为75.7%和71.9%,相比于其他算法明显提高,在多种受光照影响的环境下展现出了良好的鲁棒性,所提出的实例特征关联显著降低了暗光下车道线检测的漏检率。
基金supported by National Natural Science Foundation of China(Grant No.62073256)Shaanxi Provincial Science and Technology Department(Grant No.2023-YBGY-342)。
文摘To solve the problem of identification and measurement of two projectiles hitting the target at the same time,this paper proposes a projectile coordinate test method combining three photoelectric encoder detection screens,and establishes a coordinate calculation model for two projectiles to reach the same detection screen at the same time.The design method of three photoelectric encoder detection screens and the position coordinate recognition algorithm of the blocked array photoelectric detector when projectile passing through the photoelectric encoder detection screen are studied.Using the screen projection method,the intersected linear equation of the projectile and the line laser with the main detection screen as the core coordinate plane is established,and the projectile coordinate data set formed by any two photoelectric encoder detection screens is constructed.The principle of minimum error of coordinate data set is used to determine the coordinates of two projectiles hitting the target at the same time.The rationality and feasibility of the proposed test method are verified by experiments and comparative tests.
文摘语篇要素识别在自动作文评分中发挥着重要作用,提高语篇要素识别的准确率有助于增强自动作文评分的效果以及可解释性。然而,语篇要素识别任务面临着上下文依赖和句子歧义性等挑战。传统的基于规则和特征工程的方法难以捕捉文本中复杂的语义信息和长距离依赖关系,而深度学习方法虽然能够自动学习文本特征,但仍然存在对关键位置信息利用不足的问题。针对上述问题,提出了一种混合位置编码的语篇要素识别模型,即HPE-BiLSTM(Hybrid Position Encoding Bidirectional Long Short-Term Memory)。该模型首先基于预训练的词向量获取句子表示,然后通过双向长短期记忆网络提取句子级特征。在句子级特征的基础上,采用混合的位置编码方案以确保关键位置信息的有效传递。最后,使用线性层和激活函数实现语篇要素识别。该模型在议论文数据集进行实验,并与Feature-based、BERT、BiLSTM、DiSA和DCRGNN五个模型进行比较。实验结果表明,HPE-BiLSTM模型的准确率达到了0.693,在语篇要素识别方面的F 1分数为0.684,优于其他模型。
文摘针对印刷电路板(printed circuit board,PCB)缺陷检测过程中,因包含丰富的小目标缺陷,易出现漏检、误检现象,提出一种基于改进增强金字塔实时检测变换器(enhance pyramid real time detection transformer,EP-RTDETR)小目标PCB表面缺陷检测算法。首先,使用CSPDarknet替代原始骨干网络,以增强模型的特征提取能力;其次,设空间移动卷积门控线性单元(spatial moving point convolutional gated linear unit,SMPCGLU)模块改造C2f中的Bottleneck,增强了特征的门控调制能力和空间自适应性;再次,引入可学习位置编码,改进尺度交互机制,增强对不同位置信息的响应能力;然后,基于跨尺度特征融合模块(cross-scale feature-fusion module,CCFM)模块设计小目标增强金字塔结构(small object enhance pyramid,SOEP),增强的特征层和精细的特征融合使模型能够更准确地定位和识别小目标;最后,设计MPDIoU(minimum point distance-based intersection over union)+NWD(normalized wasserstein distance)loss,在加快模型收敛速度的同时更加关注小目标缺陷,回归结果更加准确。试验结果表明,相较于基准模型,准确率P提高了4.6%,召回率R提高了5.1%,平均精度均值mAP50提高了4.6%,参数量减少了16.38 M,浮点数减少了48.3,FPS提高了8.51 s,能够更好地进行小目标PCB表面缺陷检测。