Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the vary...Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the varying importance of each modality across different contexts,a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process.In response to these critical limitations,we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues.In our model,text is treated as the core modality,while speech and video features are selectively incorporated through a unique position-aware fusion process.The spatial position encoding strategy preserves the internal structural information of speech and visual modalities,enabling the model to capture localized intra-modal dependencies that are often overlooked.This design enhances the richness and discriminative power of the fused representation,enabling more accurate and context-aware sentiment prediction.Finally,we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model.The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks.展开更多
Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recentl...Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recently,many deep learning based methods have been proposed to predict RUL.Among these methods,recurrent neural network(RNN)based approaches show a strong capability of capturing sequential information.This allows RNN based methods to perform better than convolutional neural network(CNN)based approaches on the RUL prediction task.In this paper,we question this common paradigm and argue that existing CNN based approaches are not designed according to the classic principles of CNN,which reduces their performances.Additionally,the capacity of capturing sequential information is highly affected by the receptive field of CNN,which is neglected by existing CNN based methods.To solve these problems,we propose a series of new CNNs,which show competitive results to RNN based methods.Compared with RNN,CNN processes the input signals in parallel so that the temporal sequence is not easily determined.To alleviate this issue,a position encoding scheme is developed to enhance the sequential information encoded by a CNN.Hence,our proposed position encoding based CNN called PE-Net is further improved and even performs better than RNN based methods.Extensive experiments are conducted on the C-MAPSS dataset,where our PE-Net shows state-of-the-art performance.展开更多
In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we disc...In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we discuss how to design the encoded sign and propose a robust decoding algorithm based on contour.Secondly,the Adaboost algorithm is used to train a classifier which can detect the encoded sign from image.Lastly,the position of UAV can be calculated by using the projective relation between the object points and their corresponding image points.Experiment includes two parts.First,simulated video data is used to verify the feasibility of the proposed method,and the results show that the average absolute error in each direction is below 0.02 m.Second,a video,acquired from an actual UAV flight,is used to calculate the position of UAV.The results show that the calculated trajectory is consistent with the actual flight path.The method runs at a speed of 0.153 sper frame.展开更多
基金supported by the Collaborative Tackling Project of the Yangtze River Delta SciTech Innovation Community(Nos.2024CSJGG01503,2024CSJGG01500)Guangxi Key Research and Development Program(No.AB24010317)Jiangxi Provincial Key Laboratory of Electronic Data Control and Forensics(Jiangxi Police College)(No.2025JXJYKFJJ002).
文摘Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the varying importance of each modality across different contexts,a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process.In response to these critical limitations,we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues.In our model,text is treated as the core modality,while speech and video features are selectively incorporated through a unique position-aware fusion process.The spatial position encoding strategy preserves the internal structural information of speech and visual modalities,enabling the model to capture localized intra-modal dependencies that are often overlooked.This design enhances the richness and discriminative power of the fused representation,enabling more accurate and context-aware sentiment prediction.Finally,we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model.The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks.
基金supported by National Research Foundation of Singapore,AME Young Individual Research Grant(A2084c0167)。
文摘Accurate remaining useful life(RUL)prediction is important in industrial systems.It prevents machines from working under failure conditions,and ensures that the industrial system works reliably and efficiently.Recently,many deep learning based methods have been proposed to predict RUL.Among these methods,recurrent neural network(RNN)based approaches show a strong capability of capturing sequential information.This allows RNN based methods to perform better than convolutional neural network(CNN)based approaches on the RUL prediction task.In this paper,we question this common paradigm and argue that existing CNN based approaches are not designed according to the classic principles of CNN,which reduces their performances.Additionally,the capacity of capturing sequential information is highly affected by the receptive field of CNN,which is neglected by existing CNN based methods.To solve these problems,we propose a series of new CNNs,which show competitive results to RNN based methods.Compared with RNN,CNN processes the input signals in parallel so that the temporal sequence is not easily determined.To alleviate this issue,a position encoding scheme is developed to enhance the sequential information encoded by a CNN.Hence,our proposed position encoding based CNN called PE-Net is further improved and even performs better than RNN based methods.Extensive experiments are conducted on the C-MAPSS dataset,where our PE-Net shows state-of-the-art performance.
基金supported by the National Key Research Projects(No.2016YFB0501403)the National Demonstration Center for Experimental Remote Sensing&Information Engineering(Wuhan University)
文摘In order to achieve the goal that unmanned aerial vehicle(UAV)automatically positioning during power inspection,a visual positioning method which utilizes encoded sign as cooperative target is proposed.Firstly,we discuss how to design the encoded sign and propose a robust decoding algorithm based on contour.Secondly,the Adaboost algorithm is used to train a classifier which can detect the encoded sign from image.Lastly,the position of UAV can be calculated by using the projective relation between the object points and their corresponding image points.Experiment includes two parts.First,simulated video data is used to verify the feasibility of the proposed method,and the results show that the average absolute error in each direction is below 0.02 m.Second,a video,acquired from an actual UAV flight,is used to calculate the position of UAV.The results show that the calculated trajectory is consistent with the actual flight path.The method runs at a speed of 0.153 sper frame.