Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the vary...Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the varying importance of each modality across different contexts,a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process.In response to these critical limitations,we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues.In our model,text is treated as the core modality,while speech and video features are selectively incorporated through a unique position-aware fusion process.The spatial position encoding strategy preserves the internal structural information of speech and visual modalities,enabling the model to capture localized intra-modal dependencies that are often overlooked.This design enhances the richness and discriminative power of the fused representation,enabling more accurate and context-aware sentiment prediction.Finally,we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model.The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks.展开更多
Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recentl...Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recently,many deep learning based methods have been proposed to predict RUL.Among these methods,recurrent neural network(RNN)based approaches show a strong capability of capturing sequential information.This allows RNN based methods to perform better than convolutional neural network(CNN)based approaches on the RUL prediction task.In this paper,we question this common paradigm and argue that existing CNN based approaches are not designed according to the classic principles of CNN,which reduces their performances.Additionally,the capacity of capturing sequential information is highly affected by the receptive field of CNN,which is neglected by existing CNN based methods.To solve these problems,we propose a series of new CNNs,which show competitive results to RNN based methods.Compared with RNN,CNN processes the input signals in parallel so that the temporal sequence is not easily determined.To alleviate this issue,a position encoding scheme is developed to enhance the sequential information encoded by a CNN.Hence,our proposed position encoding based CNN called PE-Net is further improved and even performs better than RNN based methods.Extensive experiments are conducted on the C-MAPSS dataset,where our PE-Net shows state-of-the-art performance.展开更多
In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we disc...In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we discuss how to design the encoded sign and propose a robust decoding algorithm based on contour.Secondly,the Adaboost algorithm is used to train a classifier which can detect the encoded sign from image.Lastly,the position of UAV can be calculated by using the projective relation between the object points and their corresponding image points.Experiment includes two parts.First,simulated video data is used to verify the feasibility of the proposed method,and the results show that the average absolute error in each direction is below 0.02 m.Second,a video,acquired from an actual UAV flight,is used to calculate the position of UAV.The results show that the calculated trajectory is consistent with the actual flight path.The method runs at a speed of 0.153 sper frame.展开更多
Considering the influence of more random atmospheric turbulence, worse pointing errors and highly dynamic link on the transmission performance of mobile multiple-input multiple-output (MIMO) free space optics (FSO...Considering the influence of more random atmospheric turbulence, worse pointing errors and highly dynamic link on the transmission performance of mobile multiple-input multiple-output (MIMO) free space optics (FSO) communica- tion systems, this paper establishes a channel model for the mobile platform. Based on the combination of Alamouti space-time code and time hopping ultra-wide band (TH-UWB) communications, a novel repetition space-time coding (RSTC) method for mobile 2x2 free-space optical communications with pulse position modulation (PPM) is devel- oped. In particular, two decoding methods of equal gain combining (EGC) maximum likelihood detection (MLD) and correlation matrix detection (CMD) are derived. When a quasi-static fading and weak turbulence channel model are considered, simulation results show that whether the channel state information (CSI) is known or not, the coding sys- tem demonstrates more significant performance of the symbol error rate (SER) than the uncoding. In other words, transmitting diversity can be achieved while conveying the information only through the time delays of the modulated signals transmitted from different antennas. CMD has almost the same effect of signal combining with maximal ratio combining (MRC). However, when the channel correlation increases, SER performance of the coding 2×2 system de- grades significantly.展开更多
行人重识别虽已取得了显著进展,但在实际应用场景中,不同障碍物引起的遮挡问题仍然是一个亟待解决的挑战。为了从被遮挡行人中提取更有效的特征,提出了一种基于可学习掩模和位置编码(Learnable mask and position encoding, LMPE)的遮...行人重识别虽已取得了显著进展,但在实际应用场景中,不同障碍物引起的遮挡问题仍然是一个亟待解决的挑战。为了从被遮挡行人中提取更有效的特征,提出了一种基于可学习掩模和位置编码(Learnable mask and position encoding, LMPE)的遮挡行人重识别方法。首先,引入了一种可学习的双路注意力掩模生成器(Learnable dual attention mask generator, LDAMG),生成的掩模能够适应不同遮挡模式,显著提升了对被遮挡行人的识别准确性。该模块可以使网络更灵活,能更好地适应多样性的遮挡情况,有效克服了遮挡带来的困扰。同时,该网络通过掩模学习上下文信息,进一步增强了对行人所处场景的理解力。此外,为了解决Transformer位置信息损耗问题,引入了遮挡感知位置编码融合(Occlusion aware position encoding fusion, OAPEF)模块。该模块进行不同层次位置编码融合,使网络获得更强的表达能力。通过全方位整合图像位置编码,可以更准确地理解行人间的空间关系,提高模型对遮挡情况的适应能力。最后,仿真实验表明,本文提出的LMPE在Occluded-Duke和Occluded-ReID遮挡数据集以及Market-1501和DukeMTMC-ReID无遮挡数据集上都取得了较好的效果,验证了本文方法的有效性和优越性。展开更多
篇章要素识别(discourse element identification)的主要任务是识别篇章要素单元并进行分类.针对篇章要素识别对上下文依赖性理解不足的问题,提出一种基于BiLSTM-Attention的识别篇章要素模型,提高议论文篇章要素识别的准确率.该模型利...篇章要素识别(discourse element identification)的主要任务是识别篇章要素单元并进行分类.针对篇章要素识别对上下文依赖性理解不足的问题,提出一种基于BiLSTM-Attention的识别篇章要素模型,提高议论文篇章要素识别的准确率.该模型利用句子结构和位置编码来识别句子的成分关系,通过双向长短期记忆网络(bidirectional long short-term memory,BiLSTM)进一步获得深层次上下文相关联的信息;引入注意力机制(attention mechanism)优化模型特征向量,提高文本分类的准确度;最终用句间多头自注意力(multi-head self-attention)获取句子在内容和结构上的关系,弥补距离较远的句子依赖问题.相比于HBiLSTM、BERT等基线模型,在相同参数、相同实验条件下,中文数据集和英文数据集上准确率分别提升1.3%、3.6%,验证了该模型在篇章要素识别任务中的有效性.展开更多
基金supported by the Collaborative Tackling Project of the Yangtze River Delta SciTech Innovation Community(Nos.2024CSJGG01503,2024CSJGG01500)Guangxi Key Research and Development Program(No.AB24010317)Jiangxi Provincial Key Laboratory of Electronic Data Control and Forensics(Jiangxi Police College)(No.2025JXJYKFJJ002).
文摘Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the varying importance of each modality across different contexts,a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process.In response to these critical limitations,we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues.In our model,text is treated as the core modality,while speech and video features are selectively incorporated through a unique position-aware fusion process.The spatial position encoding strategy preserves the internal structural information of speech and visual modalities,enabling the model to capture localized intra-modal dependencies that are often overlooked.This design enhances the richness and discriminative power of the fused representation,enabling more accurate and context-aware sentiment prediction.Finally,we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model.The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks.
基金supported by National Research Foundation of Singapore,AME Young Individual Research Grant(A2084c0167)。
文摘Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recently,many deep learning based methods have been proposed to predict RUL.Among these methods,recurrent neural network(RNN)based approaches show a strong capability of capturing sequential information.This allows RNN based methods to perform better than convolutional neural network(CNN)based approaches on the RUL prediction task.In this paper,we question this common paradigm and argue that existing CNN based approaches are not designed according to the classic principles of CNN,which reduces their performances.Additionally,the capacity of capturing sequential information is highly affected by the receptive field of CNN,which is neglected by existing CNN based methods.To solve these problems,we propose a series of new CNNs,which show competitive results to RNN based methods.Compared with RNN,CNN processes the input signals in parallel so that the temporal sequence is not easily determined.To alleviate this issue,a position encoding scheme is developed to enhance the sequential information encoded by a CNN.Hence,our proposed position encoding based CNN called PE-Net is further improved and even performs better than RNN based methods.Extensive experiments are conducted on the C-MAPSS dataset,where our PE-Net shows state-of-the-art performance.
基金supported by the National Key Research Projects(No.2016YFB0501403)the National Demonstration Center for Experimental Remote Sensing&Information Engineering(Wuhan University)
文摘In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we discuss how to design the encoded sign and propose a robust decoding algorithm based on contour.Secondly,the Adaboost algorithm is used to train a classifier which can detect the encoded sign from image.Lastly,the position of UAV can be calculated by using the projective relation between the object points and their corresponding image points.Experiment includes two parts.First,simulated video data is used to verify the feasibility of the proposed method,and the results show that the average absolute error in each direction is below 0.02 m.Second,a video,acquired from an actual UAV flight,is used to calculate the position of UAV.The results show that the calculated trajectory is consistent with the actual flight path.The method runs at a speed of 0.153 sper frame.
基金supported by the National Natural Science Foundation of China(No.61205106)
文摘Considering the influence of more random atmospheric turbulence, worse pointing errors and highly dynamic link on the transmission performance of mobile multiple-input multiple-output (MIMO) free space optics (FSO) communica- tion systems, this paper establishes a channel model for the mobile platform. Based on the combination of Alamouti space-time code and time hopping ultra-wide band (TH-UWB) communications, a novel repetition space-time coding (RSTC) method for mobile 2x2 free-space optical communications with pulse position modulation (PPM) is devel- oped. In particular, two decoding methods of equal gain combining (EGC) maximum likelihood detection (MLD) and correlation matrix detection (CMD) are derived. When a quasi-static fading and weak turbulence channel model are considered, simulation results show that whether the channel state information (CSI) is known or not, the coding sys- tem demonstrates more significant performance of the symbol error rate (SER) than the uncoding. In other words, transmitting diversity can be achieved while conveying the information only through the time delays of the modulated signals transmitted from different antennas. CMD has almost the same effect of signal combining with maximal ratio combining (MRC). However, when the channel correlation increases, SER performance of the coding 2×2 system de- grades significantly.
文摘篇章要素识别(discourse element identification)的主要任务是识别篇章要素单元并进行分类.针对篇章要素识别对上下文依赖性理解不足的问题,提出一种基于BiLSTM-Attention的识别篇章要素模型,提高议论文篇章要素识别的准确率.该模型利用句子结构和位置编码来识别句子的成分关系,通过双向长短期记忆网络(bidirectional long short-term memory,BiLSTM)进一步获得深层次上下文相关联的信息;引入注意力机制(attention mechanism)优化模型特征向量,提高文本分类的准确度;最终用句间多头自注意力(multi-head self-attention)获取句子在内容和结构上的关系,弥补距离较远的句子依赖问题.相比于HBiLSTM、BERT等基线模型,在相同参数、相同实验条件下,中文数据集和英文数据集上准确率分别提升1.3%、3.6%,验证了该模型在篇章要素识别任务中的有效性.