Recognizing human interactions in RGB videos is a critical task in computer vision,with applications in video surveillance.Existing deep learning-based architectures have achieved strong results,but are computationall...Recognizing human interactions in RGB videos is a critical task in computer vision,with applications in video surveillance.Existing deep learning-based architectures have achieved strong results,but are computationally intensive,sensitive to video resolution changes and often fail in crowded scenes.We propose a novel hybrid system that is computationally efficient,robust to degraded video quality and able to filter out irrelevant individuals,making it suitable for real-life use.The system leverages multi-modal handcrafted features for interaction representation and a deep learning classifier for capturing complex dependencies.Using Mask R-CNN and YOLO11-Pose,we extract grayscale silhouettes and keypoint coordinates of interacting individuals,while filtering out irrelevant individuals using a proposed algorithm.From these,we extract silhouette-based features(local ternary pattern and histogram of optical flow)and keypoint-based features(distances,angles and velocities)that capture distinct spatial and temporal information.A Bidirectional Long Short-Term Memory network(BiLSTM)then classifies the interactions.Extensive experiments on the UT Interaction,SBU Kinect Interaction and the ISR-UOL 3D social activity datasets demonstrate that our system achieves competitive accuracy.They also validate the effectiveness of the chosen features and classifier,along with the proposed system’s computational efficiency and robustness to occlusion.展开更多
This paper proposes Flex-QUIC,an AIempowered quick UDP Internet connections(QUIC)enhancement framework that addresses the challenge of degraded transmission efficiency caused by the static parameterization of acknowle...This paper proposes Flex-QUIC,an AIempowered quick UDP Internet connections(QUIC)enhancement framework that addresses the challenge of degraded transmission efficiency caused by the static parameterization of acknowledgment(ACK)mechanisms,loss detection,and forward error correction(FEC)in dynamic wireless networks.Unlike the standard QUIC protocol,Flex-QUIC systematically integrates machine learning across three critical modules to achieve high-efficiency operation.First,a contextual multi-armed bandit-based ACK adaptation mechanism optimizes the ACK ratio to reduce wireless channel contention.Second,the adaptive loss detection module utilizes a long short-term memory(LSTM)model to predict the reordering displacement for optimizing the packet reordering tolerance.Third,the FEC transmission scheme jointly adjusts the redundancy level based on the LSTM-predicted loss rate and congestion window state.Extensive evaluations across Wi-Fi,5G,and satellite network scenarios demonstrate that Flex-QUIC significantly improves throughput and latency reduction compared to the standard QUIC and other enhanced QUIC variants,highlighting its adaptability to diverse and dynamic network conditions.Finally,we further discuss open issues in deploying AI-native transport protocols.展开更多
Electric cable shovel(ECS)is a complex production equipment,which is widely utilized in open-pit mines.Rational valuations of load is the foundation for the development of intelligent or unmanned ECS,since it directly...Electric cable shovel(ECS)is a complex production equipment,which is widely utilized in open-pit mines.Rational valuations of load is the foundation for the development of intelligent or unmanned ECS,since it directly influences the planning of digging trajectories and energy consumption.Load prediction of ECS mainly consists of two types of methods:physics-based modeling and data-driven methods.The former approach is based on known physical laws,usually,it is necessarily approximations of reality due to incomplete knowledge of certain processes,which introduces bias.The latter captures features/patterns from data in an end-to-end manner without dwelling on domain expertise but requires a large amount of accurately labeled data to achieve generalization,which introduces variance.In addition,some parts of load are non-observable and latent,which cannot be measured from actual system sensing,so they can’t be predicted by data-driven methods.Herein,an innovative hybrid physics-informed deep neural network(HPINN)architecture,which combines physics-based models and data-driven methods to predict dynamic load of ECS,is presented.In the proposed framework,some parts of the theoretical model are incorporated,while capturing the difficult-to-model part by training a highly expressive approximator with data.Prior physics knowledge,such as Lagrangian mechanics and the conservation of energy,is considered extra constraints,and embedded in the overall loss function to enforce model training in a feasible solution space.The satisfactory performance of the proposed framework is verified through both synthetic and actual measurement dataset.展开更多
Accurate detection of pipeline leakage is essential to maintain the safety of pipeline transportation.Recently,deep learning(DL)has emerged as a promising tool for pipeline leakage detection(PLD).However,most existing...Accurate detection of pipeline leakage is essential to maintain the safety of pipeline transportation.Recently,deep learning(DL)has emerged as a promising tool for pipeline leakage detection(PLD).However,most existing DL methods have difficulty in achieving good performance in identifying leakage types due to the complex time dynamics of pipeline data.On the other hand,the initial parameter selection in the detection model is generally random,which may lead to unstable recognition performance.For this reason,a hybrid DL framework referred to as parameter-optimized recurrent attention network(PRAN)is presented in this paper to improve the accuracy of PLD.First,a parameter-optimized long short-term memory(LSTM)network is introduced to extract effective and robust features,which exploits a particle swarm optimization(PSO)algorithm with cross-entropy fitness function to search for globally optimal parameters.With this framework,the learning representation capability of the model is improved and the convergence rate is accelerated.Moreover,an anomaly-attention mechanism(AM)is proposed to discover class discriminative information by weighting the hidden states,which contributes to amplifying the normalabnormal distinguishable discrepancy,further improving the accuracy of PLD.After that,the proposed PRAN not only implements the adaptive optimization of network parameters,but also enlarges the contribution of normal-abnormal discrepancy,thereby overcoming the drawbacks of instability and poor generalization.Finally,the experimental results demonstrate the effectiveness and superiority of the proposed PRAN for PLD.展开更多
In this paper,we introduce a visual analytics approach aimed at helping machine learning experts analyze the hidden states of layers in recurrent neural networks.Our technique allows the user to interactively inspect ...In this paper,we introduce a visual analytics approach aimed at helping machine learning experts analyze the hidden states of layers in recurrent neural networks.Our technique allows the user to interactively inspect how hidden states store and process information throughout the feeding of an input sequence into the network.The technique can help answer questions,such as which parts of the input data have a higher impact on the prediction and how the model correlates each hidden state configuration with a certain output.Our visual analytics approach comprises several components:First,our input visualization shows the input sequence and how it relates to the output(using color coding).In addition,hidden states are visualized through a nonlinear projection into a 2-D visualization space using t-distributed stochastic neighbor embedding to understand the shape of the space of the hidden states.Trajectories are also employed to show the details of the evolution of the hidden state configurations.Finally,a time-multi-class heatmap matrix visualizes the evolution of the expected predictions for multi-class classifiers,and a histogram indicates the distances between the hidden states within the original space.The different visualizations are shown simultaneously in multiple views and support brushing-and-linking to facilitate the analysis of the classifications and debugging for misclassified input sequences.To demonstrate the capability of our approach,we discuss two typical use cases for long short-term memory models applied to two widely used natural language processing datasets.展开更多
基金supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Recognizing human interactions in RGB videos is a critical task in computer vision,with applications in video surveillance.Existing deep learning-based architectures have achieved strong results,but are computationally intensive,sensitive to video resolution changes and often fail in crowded scenes.We propose a novel hybrid system that is computationally efficient,robust to degraded video quality and able to filter out irrelevant individuals,making it suitable for real-life use.The system leverages multi-modal handcrafted features for interaction representation and a deep learning classifier for capturing complex dependencies.Using Mask R-CNN and YOLO11-Pose,we extract grayscale silhouettes and keypoint coordinates of interacting individuals,while filtering out irrelevant individuals using a proposed algorithm.From these,we extract silhouette-based features(local ternary pattern and histogram of optical flow)and keypoint-based features(distances,angles and velocities)that capture distinct spatial and temporal information.A Bidirectional Long Short-Term Memory network(BiLSTM)then classifies the interactions.Extensive experiments on the UT Interaction,SBU Kinect Interaction and the ISR-UOL 3D social activity datasets demonstrate that our system achieves competitive accuracy.They also validate the effectiveness of the chosen features and classifier,along with the proposed system’s computational efficiency and robustness to occlusion.
基金supported in part by the National Key R&D Program of China with Grant number 2019YFB1803400.
文摘This paper proposes Flex-QUIC,an AIempowered quick UDP Internet connections(QUIC)enhancement framework that addresses the challenge of degraded transmission efficiency caused by the static parameterization of acknowledgment(ACK)mechanisms,loss detection,and forward error correction(FEC)in dynamic wireless networks.Unlike the standard QUIC protocol,Flex-QUIC systematically integrates machine learning across three critical modules to achieve high-efficiency operation.First,a contextual multi-armed bandit-based ACK adaptation mechanism optimizes the ACK ratio to reduce wireless channel contention.Second,the adaptive loss detection module utilizes a long short-term memory(LSTM)model to predict the reordering displacement for optimizing the packet reordering tolerance.Third,the FEC transmission scheme jointly adjusts the redundancy level based on the LSTM-predicted loss rate and congestion window state.Extensive evaluations across Wi-Fi,5G,and satellite network scenarios demonstrate that Flex-QUIC significantly improves throughput and latency reduction compared to the standard QUIC and other enhanced QUIC variants,highlighting its adaptability to diverse and dynamic network conditions.Finally,we further discuss open issues in deploying AI-native transport protocols.
基金National Natural Science Foundation of China(Grant No.52075068)Shanxi Provincial Science and Technology Major Project(Grant No.20191101014).
文摘Electric cable shovel(ECS)is a complex production equipment,which is widely utilized in open-pit mines.Rational valuations of load is the foundation for the development of intelligent or unmanned ECS,since it directly influences the planning of digging trajectories and energy consumption.Load prediction of ECS mainly consists of two types of methods:physics-based modeling and data-driven methods.The former approach is based on known physical laws,usually,it is necessarily approximations of reality due to incomplete knowledge of certain processes,which introduces bias.The latter captures features/patterns from data in an end-to-end manner without dwelling on domain expertise but requires a large amount of accurately labeled data to achieve generalization,which introduces variance.In addition,some parts of load are non-observable and latent,which cannot be measured from actual system sensing,so they can’t be predicted by data-driven methods.Herein,an innovative hybrid physics-informed deep neural network(HPINN)architecture,which combines physics-based models and data-driven methods to predict dynamic load of ECS,is presented.In the proposed framework,some parts of the theoretical model are incorporated,while capturing the difficult-to-model part by training a highly expressive approximator with data.Prior physics knowledge,such as Lagrangian mechanics and the conservation of energy,is considered extra constraints,and embedded in the overall loss function to enforce model training in a feasible solution space.The satisfactory performance of the proposed framework is verified through both synthetic and actual measurement dataset.
基金This work was supported in part by the National Natural Science Foundation of China(U21A2019,61873058),Hainan Province Science and Technology Special Fund of China(ZDYF2022SHFZ105)the Alexander von Humboldt Foundation of Germany.
文摘Accurate detection of pipeline leakage is essential to maintain the safety of pipeline transportation.Recently,deep learning(DL)has emerged as a promising tool for pipeline leakage detection(PLD).However,most existing DL methods have difficulty in achieving good performance in identifying leakage types due to the complex time dynamics of pipeline data.On the other hand,the initial parameter selection in the detection model is generally random,which may lead to unstable recognition performance.For this reason,a hybrid DL framework referred to as parameter-optimized recurrent attention network(PRAN)is presented in this paper to improve the accuracy of PLD.First,a parameter-optimized long short-term memory(LSTM)network is introduced to extract effective and robust features,which exploits a particle swarm optimization(PSO)algorithm with cross-entropy fitness function to search for globally optimal parameters.With this framework,the learning representation capability of the model is improved and the convergence rate is accelerated.Moreover,an anomaly-attention mechanism(AM)is proposed to discover class discriminative information by weighting the hidden states,which contributes to amplifying the normalabnormal distinguishable discrepancy,further improving the accuracy of PLD.After that,the proposed PRAN not only implements the adaptive optimization of network parameters,but also enlarges the contribution of normal-abnormal discrepancy,thereby overcoming the drawbacks of instability and poor generalization.Finally,the experimental results demonstrate the effectiveness and superiority of the proposed PRAN for PLD.
基金Funded by the Deutsche Forschungsgemeinschaft(German Research Foundation),No.251654672—TRR 161(Project B01)Germany’s Excellence Strategy,No.EXC-2075—390740016.
文摘In this paper,we introduce a visual analytics approach aimed at helping machine learning experts analyze the hidden states of layers in recurrent neural networks.Our technique allows the user to interactively inspect how hidden states store and process information throughout the feeding of an input sequence into the network.The technique can help answer questions,such as which parts of the input data have a higher impact on the prediction and how the model correlates each hidden state configuration with a certain output.Our visual analytics approach comprises several components:First,our input visualization shows the input sequence and how it relates to the output(using color coding).In addition,hidden states are visualized through a nonlinear projection into a 2-D visualization space using t-distributed stochastic neighbor embedding to understand the shape of the space of the hidden states.Trajectories are also employed to show the details of the evolution of the hidden state configurations.Finally,a time-multi-class heatmap matrix visualizes the evolution of the expected predictions for multi-class classifiers,and a histogram indicates the distances between the hidden states within the original space.The different visualizations are shown simultaneously in multiple views and support brushing-and-linking to facilitate the analysis of the classifications and debugging for misclassified input sequences.To demonstrate the capability of our approach,we discuss two typical use cases for long short-term memory models applied to two widely used natural language processing datasets.