To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM...To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.展开更多
Image registration within a solar photosphere sequence is crucial for observational solar physics studies requiring high spatial and temporal resolutions.Previously,we identified residual large-scale nonrigid distorti...Image registration within a solar photosphere sequence is crucial for observational solar physics studies requiring high spatial and temporal resolutions.Previously,we identified residual large-scale nonrigid distortions in high-resolution solar photosphere images from ground-based telescopes after high-resolution reconstruction.Because these distortions are not eliminated by conventional sequence correlation alignment,they can affect the analysis of small-scale activity in the solar photosphere.Here,we implemented an image registration model using deep learning(HCAM-Net)to solve the problem.Within an encoder-decoder framework,we introduced a hybrid attention mechanism to improve context information capture and extract accurate deformation fields.Analyzing solar photosphere images acquired by the New Vacuum Solar Telescope,we demonstrated that the proposed model effectively achieved highly accurate nonrigid image registration.Evaluation metrics and visualization results indicated that our model outperformed current state-of-the-art models,such as VoxelMorph and TransMorph,for nonrigid registration of solar photosphere images,with a structural similarity index measure of 0.965 and a coefficient of determination of 0.976.展开更多
With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,exist...With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,existing methods often suffer from rigid reward functions and limited adaptability to evolving adversarial strategies.Moreover,most research assumes open airspace,overlooking the influence of potential obstacles.In this paper,we address one-on-one within-visual-range ACMD in obstructed environments,and propose an improved Soft Actor-Critic(SAC)algorithm trained under a curriculum self-play framework.A maneuver strategy mirroring inference module is integrated to estimate each other's likely positions when visual obstruction occurs.By leveraging curriculum learning to guide progressive experience accumulation and self-play for adversarial evolution,our method enhances both training efficiency and tactical diversity.We further integrate an attention mechanism that dynamically adjusts the weights of sub-rewards,enabling the learned policy to adapt to rapidly changing air combat situations.Numerical simulations demonstrate that our enhanced SAC converges more quickly and achieves higher win rates than other baseline methods.An animation is available at bilibili.com/video/BV1BHVszHE98 for better illustration.展开更多
Marine forecasting is critical for navigation safety and disaster prevention.However,traditional ocean numerical forecasting models are often limited by substantial errors and inadequate capture of temporal-spatial fe...Marine forecasting is critical for navigation safety and disaster prevention.However,traditional ocean numerical forecasting models are often limited by substantial errors and inadequate capture of temporal-spatial features.To address the limitations,the paper proposes a TimeXer-based numerical forecast correction model optimized by an exogenous-variable attention mechanism.The model treats target forecast values as internal variables,and incorporates historical temporal-spatial data and seven-day numerical forecast results from traditional models as external variables based on the embedding strategy of TimeXer.Using a self-attention structure,the model captures correlations between exogenous variables and target sequences,explores intrinsic multi-dimensional relationships,and subsequently corrects endogenous variables with the mined exogenous features.The model’s performance is evaluated using metrics including MSE(Mean Squared Error),MAE(Mean Absolute Error),RMSE(Root Mean Square Error),MAPE(Mean Absolute Percentage Error),MSPE(Mean Square Percentage Error),and computational time,with TimeXer and PatchTST models serving as benchmarks.Experiment results show that the proposed model achieves lower errors and higher correction accuracy for both one-day and seven-day forecasts.展开更多
The traditional You Only Look Once(YOLO)series network models often fail to extract satisfactory features for road detection,due to the limited number of defect images in the dataset.Additionally,most open-source road...The traditional You Only Look Once(YOLO)series network models often fail to extract satisfactory features for road detection,due to the limited number of defect images in the dataset.Additionally,most open-source road crack datasets contain idealized cracks that are not suitable for detecting early-stage pavement cracks with fine widths and subtle features.To address these issues,this study collected a large number of original road surface images using road detection vehicles.A large-capacity crack dataset was then constructed,with various shapes of cracks categorized as either cracks or fractures.To improve the training performance of the YOLOv5 algorithm,which showed unsatisfactory results on the original dataset,this study used median filtering to preprocess the crack images.The preprocessed images were combined to form the training set.Moreover,the Coordinate Attention(CA)attention module was integrated to further enhance the model’s training performance.The final detection model achieved a recognition accuracy of 88.9%and a recall rate of 86.1%for detecting cracks.These findings demonstrate that the use of image preprocessing technology and the introduction of the CA attention mechanism can effectively detect early-stage pavement cracks that have low contrast with the background.展开更多
The rapid development and widespread adoption of Internet technology have significantly increased Internet traffic,highlighting the growing importance of network security.Intrusion Detection Systems(IDS)are essential ...The rapid development and widespread adoption of Internet technology have significantly increased Internet traffic,highlighting the growing importance of network security.Intrusion Detection Systems(IDS)are essential for safeguarding network integrity.To address the low accuracy of existing intrusion detection models in identifying network attacks,this paper proposes an intrusion detection method based on the fusion of Spatial Attention mechanism and Residual Neural Network(SA-ResNet).Utilizing residual connections can effectively capture local features in the data;by introducing a spatial attention mechanism,the global dependency relationships of intrusion features can be extracted,enhancing the intrusion recognition model’s focus on the global features of intrusions,and effectively improving the accuracy of intrusion recognition.The proposed model in this paper was experimentally verified on theNSL-KDD dataset.The experimental results showthat the intrusion recognition accuracy of the intrusion detection method based on SA-ResNet has reached 99.86%,and its overall accuracy is 0.41% higher than that of traditional Convolutional Neural Network(CNN)models.展开更多
Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially...Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially leading to false positives or missed detections.To solve these problems,the YOLOv8 network is enhanced by adding deformable convolution and atrous spatial pyramid pooling(ASPP),along with the integration of a coordinate attention(CA)mechanism.This allows the network to focus on small targets while expanding the receptive field without losing resolution.At the same time,context information on the target is gathered and feature expression is enhanced by attention modules in different directions.It effectively improves the positioning accuracy and achieves good results on the LUNA16 dataset.Compared with other detection algorithms,it improves the accuracy of pulmonary nodule detection to a certain extent.展开更多
Unsteady aerodynamic characteristics at high angles of attack are of great importance to the design and development of advanced fighter aircraft, which are characterized by post-stall maneuverability with multiple Deg...Unsteady aerodynamic characteristics at high angles of attack are of great importance to the design and development of advanced fighter aircraft, which are characterized by post-stall maneuverability with multiple Degrees-of-Freedom(multi-DOF) and complex flow field structure.In this paper, a special kind of cable-driven parallel mechanism is firstly utilized as a new suspension method to conduct unsteady dynamic wind tunnel tests at high angles of attack, thereby providing experimental aerodynamic data. These tests include a wide range of multi-DOF coupled oscillatory motions with various amplitudes and frequencies. Then, for aerodynamic modeling and analysis, a novel data-driven Feature-Level Attention Recurrent neural network(FLAR) is proposed. This model incorporates a specially designed feature-level attention module that focuses on the state variables affecting the aerodynamic coefficients, thereby enhancing the physical interpretability of the aerodynamic model. Subsequently, spin maneuver simulations, using a mathematical model as the baseline, are conducted to validate the effectiveness of the FLAR. Finally, the results on wind tunnel data reveal that the FLAR accurately predicts aerodynamic coefficients, and observations through the visualization of attention scores identify the key state variables that affect the aerodynamic coefficients. It is concluded that the proposed FLAR enhances the interpretability of the aerodynamic model while achieving good prediction accuracy and generalization capability for multi-DOF coupling motion at high angles of attack.展开更多
Responding to the stochasticity and uncertainty in the power height of distributed photovoltaic power generation.This paper presents a distributed photovoltaic ultra-short-term power forecasting method based on Variat...Responding to the stochasticity and uncertainty in the power height of distributed photovoltaic power generation.This paper presents a distributed photovoltaic ultra-short-term power forecasting method based on Variational Mode Decomposition(VMD)and Channel Attention Mechanism.First,Pearson’s correlation coefficient was utilized to filter out the meteorological factors that had a high impact on historical power.Second,the distributed PV power data were decomposed into a relatively smooth power series with different fluctuation patterns using variational modal decomposition(VMD).Finally,the reconstructed distributed PV power as well as other features are input into the combined CNN-SENet-BiLSTM model.In this model,the convolutional neural network(CNN)and channel attention mechanism dynamically adjust the weights while capturing the spatial features of the input data to improve the discriminative ability of key features.The extracted data is then fed into the bidirectional long short-term memory network(BiLSTM)to capture the time-series features,and the final output is the prediction result.The verification is conducted using a dataset from a distributed photovoltaic power station in the Northwest region of China.The results show that compared with other prediction methods,the method proposed in this paper has a higher prediction accuracy,which helps to improve the proportion of distributed PV access to the grid,and can guarantee the safe and stable operation of the power grid.展开更多
The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on met...The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on methods frequently encounter challenges, including misalignment between the body and clothing, noticeable artifacts, and the loss of intricate garment details. To overcome these challenges, we introduce a two-stage high-resolution virtual try-on framework that integrates an attention mechanism, comprising a garment warping stage and an image generation stage. During the garment warping stage, we incorporate a channel attention mechanism to effectively retain the critical features of the garment, addressing challenges such as the loss of patterns, colors, and other essential details commonly observed in virtual try-on images produced by existing methods. During the image generation stage, with the aim of maximizing the utilization of the information proffered by the input image, the input features undergo double sampling within the normalization procedure, thereby enhancing the detail fidelity and clothing alignment efficacy of the output image. Experimental evaluations conducted on high-resolution datasets validate the effectiveness of the proposed method. Results demonstrate significant improvements in preserving garment details, reducing artifacts, and achieving superior alignment between the clothing and body compared to baseline methods, establishing its advantage in generating realistic and high-quality virtual try-on images.展开更多
Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensi...Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensitivity.Given the complex engineering environment,automatic multi-classification of microseismic data is highly required.In this study,we use acceleration sensors to collect signals and combine the improved Visual Geometry Group with a convolutional block attention module to obtain a new network structure,termed CNN_BAM,for automatic classification and identification of microseismic events.We use the dataset collected from the Hanjiang-to-Weihe River Diversion Project to train and validate the network model.Results show that the CNN_BAM model exhibits good feature extraction ability,achieving a recognition accuracy of 99.29%,surpassing all its counterparts.The stability and accuracy of the classification algorithm improve remarkably.In addition,through fine-tuning and migration to the Pan Ⅱ Mine Project,the network demonstrates reliable generalization performance.This outcome reflects its adaptability across different projects and promising application prospects.展开更多
This research proposes an innovative solution to the inherent challenges faced by landslide displacement prediction models based on data-driven methods,such as the need for extensive historical datasets for training,t...This research proposes an innovative solution to the inherent challenges faced by landslide displacement prediction models based on data-driven methods,such as the need for extensive historical datasets for training,the reliance on manual feature selection,and the difficulty in effectively utilizing landslide historical data.We have developed a dual-channel deep learning prediction model that integrates multimodal decomposition and an attention mechanism to overcome these challenges and improve prediction performance.The proposed methodology follows a three-stage framework:(1)Empirical Mode Decomposition(EMD)effectively segregates cumulative displacement and feature factors;(2)We have developed a Double Exponential Smoothing(DES)ensemble optimized through a Non-dominated Sorting Genetic Algorithm-II(NSGA-II)to enhance trend prediction;while employing a Bidirectional Long Short-Term Memory-Radial Basis Function(BiLSTM-RBF)network enhanced by a hybrid attention mechanism,which facilitates a global-local synergistic approach to hierarchical feature extraction,thereby improving the prediction of periodic displacements;(3)A bidirectional adaptive feature extraction mechanism aligns attention weights with BiLSTM propagation paths through spatial mapping,complemented by an innovative loss function incorporating Prediction Interval(PI)width optimization.In the comparative experiments of the Baishuihe landslide:the RMSE,MAE,and R^(2) indexes of monitoring point ZG118 are improved by 19.8%,35.2%,and 3.2%compared with the optimal baseline model(RBF-MIC);in the monitoring point ZG93,where the amount of data is less,the three indexes are even more improved by 52.1%,32.3%,and 21.8%compared with the optimal baseline model(GRU-None).These results substantiate the model’s capacity to overcome dual constraints of data paucity and feature engineering limitations in geohazard prediction.展开更多
Mineral identification is foundational to geological survey research,mineral resource exploration,and mining engineering.Considering the diversity of mineral types and the challenge of achieving high recognition accur...Mineral identification is foundational to geological survey research,mineral resource exploration,and mining engineering.Considering the diversity of mineral types and the challenge of achieving high recognition accuracy for similar features,this study introduces a mineral detection method based on YOLOv8-SBI.This work enhances feature extraction by integrating spatial pyramid pooling-fast(SPPF)with the simplified self-attention module(SimAM),significantly improving the precision of mineral feature detection.In the feature fusion network,a weighted bidirectional feature pyramid network is employed for advanced cross-channel feature integration,effectively reducing feature redundancy.Additionally,Inner-Intersection Over Union(InnerIOU)is used as the loss function to improve the average quality localization performance of anchor boxes.Experimental results show that the YOLOv8-SBI model achieves an accuracy of 67.9%,a recall of 74.3%,a mAP@0.5 of 75.8%,and a mAP@0.5:0.95 of 56.7%,with a real-time detection speed of 244.2 frames per second.Compared to YOLOv8,YOLOv8-SBI demonstrates a significant improvement with 15.4%increase in accuracy,28.5%increase in recall,and increases of 28.1%and 20.9%in mAP@0.5 and mAP@0.5:0.95,respectively.Furthermore,relative to other models,such as YOLOv3,YOLOv5,YOLOv6,YOLOv8,YOLOv9,and YOLOv10,YOLOv8-SBI has a smaller parameter size of only 3.01×10^(6).This highlights the optimal balance between detection accuracy and speed,thereby offering robust technical support for intelligent mineral classification.展开更多
Waveforms of artificially induced explosions and collapse events recorded by the seismic network share similarities with natural earthquakes.Failure to identify and screen them in a timely manner can introduce confusi...Waveforms of artificially induced explosions and collapse events recorded by the seismic network share similarities with natural earthquakes.Failure to identify and screen them in a timely manner can introduce confusion into the earthquake catalog established using these recordings,thereby impacting future seismological research.Therefore,the identification and separation of natural earthquakes from continuous seismic signals contribute to the monitoring and early warning of destructive tectonic earthquakes.A 1D convolutional neural network(CNN)is proposed for seismic event classification using an efficient channel attention mechanism and an improved light inception block.A total of 9937 seismic sample records are obtained after waveform interception,filtering,and normalization.The proposed model can obtain better classification performance than other major existing methods,exhibiting 96.79%overall classification accuracy and 96.73%,94.85%,and 96.35%classification accuracy for natural seismic events,collapse events,and blasting events,respectively.Meanwhile,the proposed model is lighter than the 2D convolutional and common inception networks.We also apply the proposed model to the seismic data recorded at the University of Utah seismograph stations and compare its performance with that of the CNN-waveform model.展开更多
With the development of artificial intelligence and deep learning,the attention mechanism has become a key technology for enhancing the performance of complex tasks.This paper reviews the evolution of attention mechan...With the development of artificial intelligence and deep learning,the attention mechanism has become a key technology for enhancing the performance of complex tasks.This paper reviews the evolution of attention mechanisms,including soft attention,hard attention,and recent innovations such as multi-head latent attention and cross-attention.It focuses on the latest research outcomes,such as lightning attention,the PADRe polynomial attention replacement algorithm,the context anchor attention module,and improvements in attention mechanisms for large models.These advancements improve the efficiency and accuracy of models,expanding the application potential of attention mechanisms in fields such as computer vision,natural language processing,and remote sensing object detection,aiming to provide readers with a comprehensive understanding and stimulate innovative thinking.展开更多
Infrared imaging technology has been widely adopted in various fields,such as military reconnaissance,medical diagnosis,and security monitoring,due to its excellent ability to penetrate smoke and fog.However,the preva...Infrared imaging technology has been widely adopted in various fields,such as military reconnaissance,medical diagnosis,and security monitoring,due to its excellent ability to penetrate smoke and fog.However,the prevalent low resolution of infrared images severely limits the accurate interpretation of their contents.In addition,deploying super-resolution models on resource-constrained devices faces significant challenges.To address these issues,this study proposes a lightweight super-resolution network for infrared images based on an adaptive attention mechanism.The network’s dynamic weighting module automatically adjusts the weights of the attention and nonattention branch outputs based on the network’s characteristics at different levels.Among them,the attention branch is further subdivided into pixel attention and brightness-texture attention,which are specialized for extracting the most informative features in infrared images.Meanwhile,the non-attention branch supplements the extraction of those neglected features to enhance the comprehensiveness of the features.Through ablation experiments,we verify the effectiveness of the proposed module.Finally,through experiments on two datasets,FLIR and Thermal101,qualitative and quantitative results demonstrate that the model can effectively recover high-frequency details of infrared images and significantly improve image resolution.In detail,compared with the suboptimal method,we have reduced the number of parameters by 30%and improved the model performance.When the scale factor is 2,the peak signal-tonoise ratio of the test datasets FLIR and Thermal101 is improved by 0.09 and 0.15 dB,respectively.When the scale factor is 4,it is improved by 0.05 and 0.09 dB,respectively.In addition,due to the lightweight design of the network structure,it has a low computational cost.It is suitable for deployment on edge devices,thus effectively enhancing the sensing performance of infrared imaging devices.展开更多
During the operation, maintenance and upkeep of concrete buildings, surface cracks are often regarded as important warning signs of potential damage. Their precise segmentation plays a key role in assessing the health...During the operation, maintenance and upkeep of concrete buildings, surface cracks are often regarded as important warning signs of potential damage. Their precise segmentation plays a key role in assessing the health of a building. Traditional manual inspection is subjective, inefficient and has safety hazards. In contrast, current mainstream computer vision–based crack segmentation methods still suffer from missed detections, false detections, and segmentation discontinuities. These problems are particularly evident when dealing with small cracks, complex backgrounds, and blurred boundaries. For this reason, this paper proposes a lightweight building surface crack segmentation method, HL-YOLO, based on YOLOv11n-seg, which integrates an attention mechanism and a dilation-wise residual structure. First, we design a lightweight backbone network, RCSAA-Net, which combines ResNet50, capable of multi-scale feature extraction, with a custom Channel-Spatial Aggregation Attention (CSAA) module. This design boosts the model’s capacity to extract features of fine cracks and complex backgrounds. Among them, the CSAA module enhances the model’s attention to critical crack areas by capturing global dependencies in feature maps. Secondly, we construct an enhanced Content-aware ReAssembly of FEatures (ProCARAFE) module. It introduces a larger receptive field and dynamic kernel generation mechanism to achieve the reconstruction and accurate restoration of crack edge details. Finally, a Dilation-wise Residual (DWR) structure is introduced to reconstruct the C3k2 modules in the neck. It enhances multi-scale feature extraction and long-range contextual information fusion capabilities through multi-rate depthwise dilated convolutions. The improved model’s superiority and generalization ability have been validated through experiments on the self-built dataset. Compared to the baseline model, HL-YOLO improves mean Average Precision at 0.5 IoU by 4.1%, and increases the mean Intersection over Union (mIoU) by 4.86%, with only 3.12 million parameters. These results indicate that HL-YOLO can efficiently and accurately identify cracks on building surfaces, meeting the demand for rapid detection and providing an effective technical solution for real-time crack monitoring.展开更多
As the group-buying model shows significant progress in attracting new users,enhancing user engagement,and increasing platform profitability,providing personalized recommendations for group-buying users has emerged as...As the group-buying model shows significant progress in attracting new users,enhancing user engagement,and increasing platform profitability,providing personalized recommendations for group-buying users has emerged as a new challenge in the field of recommendation systems.This paper introduces a group-buying recommendation model based on multi-head attention mechanisms and multi-task learning,termed the Multi-head Attention Mechanisms and Multi-task Learning Group-Buying Recommendation(MAMGBR)model,specifically designed to optimize group-buying recommendations on e-commerce platforms.The core dataset of this study comes from the Chinese maternal and infant e-commerce platform“Beibei,”encompassing approximately 430,000 successful groupbuying actions and over 120,000 users.Themodel focuses on twomain tasks:recommending items for group organizers(Task Ⅰ)and recommending participants for a given group-buying event(Task Ⅱ).In model evaluation,MAMGBR achieves an MRR@10 of 0.7696 for Task I,marking a 20.23%improvement over baseline models.Furthermore,in Task II,where complex interaction patterns prevail,MAMGBR utilizes auxiliary loss functions to effectively model the multifaceted roles of users,items,and participants,leading to a 24.08%increase in MRR@100 under a 1:99 sample ratio.Experimental results show that compared to benchmark models,such as NGCF and EATNN,MAMGBR’s integration ofmulti-head attentionmechanisms,expert networks,and gating mechanisms enables more accurate modeling of user preferences and social associations within group-buying scenarios,significantly enhancing recommendation accuracy and platform group-buying success rates.展开更多
Human pose estimation has received much attention from the research community because of its wide range of applications.However,current research for pose estimation is usually complex and computationally intensive,esp...Human pose estimation has received much attention from the research community because of its wide range of applications.However,current research for pose estimation is usually complex and computationally intensive,especially the feature loss problems in the feature fusion process.To address the above problems,we propose a lightweight human pose estimation network based on multi-attention mechanism(LMANet).In our method,network parameters can be significantly reduced by lightweighting the bottleneck blocks with depth-wise separable convolution on the high-resolution networks.After that,we also introduce a multi-attention mechanism to improve the model prediction accuracy,and the channel attention module is added in the initial stage of the network to enhance the local cross-channel information interaction.More importantly,we inject spatial crossawareness module in the multi-scale feature fusion stage to reduce the spatial information loss during feature extraction.Extensive experiments on COCO2017 dataset and MPII dataset show that LMANet can guarantee a higher prediction accuracy with fewer network parameters and computational effort.Compared with the highresolution network HRNet,the number of parameters and the computational complexity of the network are reduced by 67%and 73%,respectively.展开更多
Tropical cyclones(TCs)are complex and powerful weather systems,and accurately forecasting their path,structure,and intensity remains a critical focus and challenge in meteorological research.In this paper,we propose a...Tropical cyclones(TCs)are complex and powerful weather systems,and accurately forecasting their path,structure,and intensity remains a critical focus and challenge in meteorological research.In this paper,we propose an Attention Spatio-Temporal predictive Generative Adversarial Network(AST-GAN)model for predicting the temporal and spatial distribution of TCs.The model forecasts the spatial distribution of TC wind speeds for the next 15 hours at 3-hour intervals,emphasizing the cyclone's center,high wind-speed areas,and its asymmetric structure.To effectively capture spatiotemporal feature transfer at different time steps,we employ a channel attention mechanism for feature selection,enhancing model performance and reducing parameter redundancy.We utilized High-Resolution Weather Research and Forecasting(HWRF)data to train our model,allowing it to assimilate a wide range of TC motion patterns.The model is versatile and can be applied to various complex scenarios,such as multiple TCs moving simultaneously or TCs approaching landfall.Our proposed model demonstrates superior forecasting performance,achieving a root-mean-square error(RMSE)of 0.71 m s^(-1)for overall wind speed and 2.74 m s^(-1)for maximum wind speed when benchmarked against ground truth data from HWRF.Furthermore,the model underwent optimization and independent testing using ERA5reanalysis data,showcasing its stability and scalability.After fine-tuning on the ERA5 dataset,the model achieved an RMSE of 1.33 m s^(-1)for wind speed and 1.75 m s^(-1)for maximum wind speed.The AST-GAN model outperforms other state-of-the-art models in RMSE on both the HWRF and ERA5 datasets,maintaining its superior performance and demonstrating its effectiveness for spatiotemporal prediction of TCs.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.12204062the Natural Science Foundation of Shandong Province under Grant No.ZR2022MF330。
文摘To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.
基金funded by the Strategic Priority Research Program of the Chinese Academy of Sciences(XDB0560000)the National Natural Science Foundation of China(12473054)+1 种基金the Basic Research on Fund Projects in Yunnan Province(2019FA001)the Yunnan Province Science Foundation Project(202105AC160085).
文摘Image registration within a solar photosphere sequence is crucial for observational solar physics studies requiring high spatial and temporal resolutions.Previously,we identified residual large-scale nonrigid distortions in high-resolution solar photosphere images from ground-based telescopes after high-resolution reconstruction.Because these distortions are not eliminated by conventional sequence correlation alignment,they can affect the analysis of small-scale activity in the solar photosphere.Here,we implemented an image registration model using deep learning(HCAM-Net)to solve the problem.Within an encoder-decoder framework,we introduced a hybrid attention mechanism to improve context information capture and extract accurate deformation fields.Analyzing solar photosphere images acquired by the New Vacuum Solar Telescope,we demonstrated that the proposed model effectively achieved highly accurate nonrigid image registration.Evaluation metrics and visualization results indicated that our model outperformed current state-of-the-art models,such as VoxelMorph and TransMorph,for nonrigid registration of solar photosphere images,with a structural similarity index measure of 0.965 and a coefficient of determination of 0.976.
基金support of the National Key Research and Development Plan(No.2021YFB3302501)the financial support of the National Science Foundation of China(No.12161076)the financial support of the Fundamental Research Funds for the Central Universities(No.DUT25GF207).
文摘With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,existing methods often suffer from rigid reward functions and limited adaptability to evolving adversarial strategies.Moreover,most research assumes open airspace,overlooking the influence of potential obstacles.In this paper,we address one-on-one within-visual-range ACMD in obstructed environments,and propose an improved Soft Actor-Critic(SAC)algorithm trained under a curriculum self-play framework.A maneuver strategy mirroring inference module is integrated to estimate each other's likely positions when visual obstruction occurs.By leveraging curriculum learning to guide progressive experience accumulation and self-play for adversarial evolution,our method enhances both training efficiency and tactical diversity.We further integrate an attention mechanism that dynamically adjusts the weights of sub-rewards,enabling the learned policy to adapt to rapidly changing air combat situations.Numerical simulations demonstrate that our enhanced SAC converges more quickly and achieves higher win rates than other baseline methods.An animation is available at bilibili.com/video/BV1BHVszHE98 for better illustration.
基金supported by the National Key Research and Development Program Project(2023YFC3107804)Planning Fund Project of Humanities and Social Sciences Research of the Ministry of Education(24YJA880097)the Graduate Education Reform Project in North China University of Technology(217051360025XN095-17)。
文摘Marine forecasting is critical for navigation safety and disaster prevention.However,traditional ocean numerical forecasting models are often limited by substantial errors and inadequate capture of temporal-spatial features.To address the limitations,the paper proposes a TimeXer-based numerical forecast correction model optimized by an exogenous-variable attention mechanism.The model treats target forecast values as internal variables,and incorporates historical temporal-spatial data and seven-day numerical forecast results from traditional models as external variables based on the embedding strategy of TimeXer.Using a self-attention structure,the model captures correlations between exogenous variables and target sequences,explores intrinsic multi-dimensional relationships,and subsequently corrects endogenous variables with the mined exogenous features.The model’s performance is evaluated using metrics including MSE(Mean Squared Error),MAE(Mean Absolute Error),RMSE(Root Mean Square Error),MAPE(Mean Absolute Percentage Error),MSPE(Mean Square Percentage Error),and computational time,with TimeXer and PatchTST models serving as benchmarks.Experiment results show that the proposed model achieves lower errors and higher correction accuracy for both one-day and seven-day forecasts.
基金jointly supported by the National Natural Science Foundation of China(No.52308332)the China Postdoctoral Science Foundation(Grant No.2022M712787).
文摘The traditional You Only Look Once(YOLO)series network models often fail to extract satisfactory features for road detection,due to the limited number of defect images in the dataset.Additionally,most open-source road crack datasets contain idealized cracks that are not suitable for detecting early-stage pavement cracks with fine widths and subtle features.To address these issues,this study collected a large number of original road surface images using road detection vehicles.A large-capacity crack dataset was then constructed,with various shapes of cracks categorized as either cracks or fractures.To improve the training performance of the YOLOv5 algorithm,which showed unsatisfactory results on the original dataset,this study used median filtering to preprocess the crack images.The preprocessed images were combined to form the training set.Moreover,the Coordinate Attention(CA)attention module was integrated to further enhance the model’s training performance.The final detection model achieved a recognition accuracy of 88.9%and a recall rate of 86.1%for detecting cracks.These findings demonstrate that the use of image preprocessing technology and the introduction of the CA attention mechanism can effectively detect early-stage pavement cracks that have low contrast with the background.
基金supported by National Natural Science Foundation of China(62473341)Key Research and Development Special Project of Henan Province(221111210500)Key Research and Development Special Project of Henan Province(242102211071,242102210142,232102211053).
文摘The rapid development and widespread adoption of Internet technology have significantly increased Internet traffic,highlighting the growing importance of network security.Intrusion Detection Systems(IDS)are essential for safeguarding network integrity.To address the low accuracy of existing intrusion detection models in identifying network attacks,this paper proposes an intrusion detection method based on the fusion of Spatial Attention mechanism and Residual Neural Network(SA-ResNet).Utilizing residual connections can effectively capture local features in the data;by introducing a spatial attention mechanism,the global dependency relationships of intrusion features can be extracted,enhancing the intrusion recognition model’s focus on the global features of intrusions,and effectively improving the accuracy of intrusion recognition.The proposed model in this paper was experimentally verified on theNSL-KDD dataset.The experimental results showthat the intrusion recognition accuracy of the intrusion detection method based on SA-ResNet has reached 99.86%,and its overall accuracy is 0.41% higher than that of traditional Convolutional Neural Network(CNN)models.
文摘Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially leading to false positives or missed detections.To solve these problems,the YOLOv8 network is enhanced by adding deformable convolution and atrous spatial pyramid pooling(ASPP),along with the integration of a coordinate attention(CA)mechanism.This allows the network to focus on small targets while expanding the receptive field without losing resolution.At the same time,context information on the target is gathered and feature expression is enhanced by attention modules in different directions.It effectively improves the positioning accuracy and achieves good results on the LUNA16 dataset.Compared with other detection algorithms,it improves the accuracy of pulmonary nodule detection to a certain extent.
基金supported by the National Natural Science Foundation of China(Nos.12172315,12072304,11702232)the Fujian Provincial Natural Science Foundation,China(No.2021J01050)the Aeronautical Science Foundation of China(No.20220013068002).
文摘Unsteady aerodynamic characteristics at high angles of attack are of great importance to the design and development of advanced fighter aircraft, which are characterized by post-stall maneuverability with multiple Degrees-of-Freedom(multi-DOF) and complex flow field structure.In this paper, a special kind of cable-driven parallel mechanism is firstly utilized as a new suspension method to conduct unsteady dynamic wind tunnel tests at high angles of attack, thereby providing experimental aerodynamic data. These tests include a wide range of multi-DOF coupled oscillatory motions with various amplitudes and frequencies. Then, for aerodynamic modeling and analysis, a novel data-driven Feature-Level Attention Recurrent neural network(FLAR) is proposed. This model incorporates a specially designed feature-level attention module that focuses on the state variables affecting the aerodynamic coefficients, thereby enhancing the physical interpretability of the aerodynamic model. Subsequently, spin maneuver simulations, using a mathematical model as the baseline, are conducted to validate the effectiveness of the FLAR. Finally, the results on wind tunnel data reveal that the FLAR accurately predicts aerodynamic coefficients, and observations through the visualization of attention scores identify the key state variables that affect the aerodynamic coefficients. It is concluded that the proposed FLAR enhances the interpretability of the aerodynamic model while achieving good prediction accuracy and generalization capability for multi-DOF coupling motion at high angles of attack.
基金supported by the Inner Mongolia Power Company 2024 Staff Innovation Studio Innovation Project“Research on Cluster Output Prediction and Group Control Technology for County-Wide Distributed Photovoltaic Construction”.
文摘Responding to the stochasticity and uncertainty in the power height of distributed photovoltaic power generation.This paper presents a distributed photovoltaic ultra-short-term power forecasting method based on Variational Mode Decomposition(VMD)and Channel Attention Mechanism.First,Pearson’s correlation coefficient was utilized to filter out the meteorological factors that had a high impact on historical power.Second,the distributed PV power data were decomposed into a relatively smooth power series with different fluctuation patterns using variational modal decomposition(VMD).Finally,the reconstructed distributed PV power as well as other features are input into the combined CNN-SENet-BiLSTM model.In this model,the convolutional neural network(CNN)and channel attention mechanism dynamically adjust the weights while capturing the spatial features of the input data to improve the discriminative ability of key features.The extracted data is then fed into the bidirectional long short-term memory network(BiLSTM)to capture the time-series features,and the final output is the prediction result.The verification is conducted using a dataset from a distributed photovoltaic power station in the Northwest region of China.The results show that compared with other prediction methods,the method proposed in this paper has a higher prediction accuracy,which helps to improve the proportion of distributed PV access to the grid,and can guarantee the safe and stable operation of the power grid.
基金supported by the National Natural Science Foundation of China(61772179)Hunan Provincial Natural Science Foundation of China(2022JJ50016,2023JJ50095)+1 种基金the Science and Technology Plan Project of Hunan Province(2016TP1020)Double First-Class University Project of Hunan Province(Xiangjiaotong[2018]469,[2020]248).
文摘The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on methods frequently encounter challenges, including misalignment between the body and clothing, noticeable artifacts, and the loss of intricate garment details. To overcome these challenges, we introduce a two-stage high-resolution virtual try-on framework that integrates an attention mechanism, comprising a garment warping stage and an image generation stage. During the garment warping stage, we incorporate a channel attention mechanism to effectively retain the critical features of the garment, addressing challenges such as the loss of patterns, colors, and other essential details commonly observed in virtual try-on images produced by existing methods. During the image generation stage, with the aim of maximizing the utilization of the information proffered by the input image, the input features undergo double sampling within the normalization procedure, thereby enhancing the detail fidelity and clothing alignment efficacy of the output image. Experimental evaluations conducted on high-resolution datasets validate the effectiveness of the proposed method. Results demonstrate significant improvements in preserving garment details, reducing artifacts, and achieving superior alignment between the clothing and body compared to baseline methods, establishing its advantage in generating realistic and high-quality virtual try-on images.
基金supported by the Key Research and Development Plan of Anhui Province(202104a05020059)the Excellent Scientific Research and Innovation Team of Anhui Province(2022AH010003)support from Hefei Comprehensive National Science Center is highly appreciated.
文摘Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensitivity.Given the complex engineering environment,automatic multi-classification of microseismic data is highly required.In this study,we use acceleration sensors to collect signals and combine the improved Visual Geometry Group with a convolutional block attention module to obtain a new network structure,termed CNN_BAM,for automatic classification and identification of microseismic events.We use the dataset collected from the Hanjiang-to-Weihe River Diversion Project to train and validate the network model.Results show that the CNN_BAM model exhibits good feature extraction ability,achieving a recognition accuracy of 99.29%,surpassing all its counterparts.The stability and accuracy of the classification algorithm improve remarkably.In addition,through fine-tuning and migration to the Pan Ⅱ Mine Project,the network demonstrates reliable generalization performance.This outcome reflects its adaptability across different projects and promising application prospects.
基金supported in part by the Guizhou Province Science Technology Support Plan([2024]General 007,[2022]General 264,[2023]General 096,[2023]General 412,and[2023]General 409)in part by the National Natural Science Foundation of China(Grant No.61861007)+2 种基金in part by the Guizhou Province Science and Technology Planning Project(ZK[2021]General 303)in part by the Project of GUIYANG HYDROPOWER INVESTIGATION DESIGN&RESEARCH INSTITUTE CHECC(YJ2022-12)in part by the Science and Technology Project of Power Construction Corporation of China,Ltd.(DJ-ZDXM-2022-44).
文摘This research proposes an innovative solution to the inherent challenges faced by landslide displacement prediction models based on data-driven methods,such as the need for extensive historical datasets for training,the reliance on manual feature selection,and the difficulty in effectively utilizing landslide historical data.We have developed a dual-channel deep learning prediction model that integrates multimodal decomposition and an attention mechanism to overcome these challenges and improve prediction performance.The proposed methodology follows a three-stage framework:(1)Empirical Mode Decomposition(EMD)effectively segregates cumulative displacement and feature factors;(2)We have developed a Double Exponential Smoothing(DES)ensemble optimized through a Non-dominated Sorting Genetic Algorithm-II(NSGA-II)to enhance trend prediction;while employing a Bidirectional Long Short-Term Memory-Radial Basis Function(BiLSTM-RBF)network enhanced by a hybrid attention mechanism,which facilitates a global-local synergistic approach to hierarchical feature extraction,thereby improving the prediction of periodic displacements;(3)A bidirectional adaptive feature extraction mechanism aligns attention weights with BiLSTM propagation paths through spatial mapping,complemented by an innovative loss function incorporating Prediction Interval(PI)width optimization.In the comparative experiments of the Baishuihe landslide:the RMSE,MAE,and R^(2) indexes of monitoring point ZG118 are improved by 19.8%,35.2%,and 3.2%compared with the optimal baseline model(RBF-MIC);in the monitoring point ZG93,where the amount of data is less,the three indexes are even more improved by 52.1%,32.3%,and 21.8%compared with the optimal baseline model(GRU-None).These results substantiate the model’s capacity to overcome dual constraints of data paucity and feature engineering limitations in geohazard prediction.
基金supported by the National Natural Science Foundation of China(42202175).
文摘Mineral identification is foundational to geological survey research,mineral resource exploration,and mining engineering.Considering the diversity of mineral types and the challenge of achieving high recognition accuracy for similar features,this study introduces a mineral detection method based on YOLOv8-SBI.This work enhances feature extraction by integrating spatial pyramid pooling-fast(SPPF)with the simplified self-attention module(SimAM),significantly improving the precision of mineral feature detection.In the feature fusion network,a weighted bidirectional feature pyramid network is employed for advanced cross-channel feature integration,effectively reducing feature redundancy.Additionally,Inner-Intersection Over Union(InnerIOU)is used as the loss function to improve the average quality localization performance of anchor boxes.Experimental results show that the YOLOv8-SBI model achieves an accuracy of 67.9%,a recall of 74.3%,a mAP@0.5 of 75.8%,and a mAP@0.5:0.95 of 56.7%,with a real-time detection speed of 244.2 frames per second.Compared to YOLOv8,YOLOv8-SBI demonstrates a significant improvement with 15.4%increase in accuracy,28.5%increase in recall,and increases of 28.1%and 20.9%in mAP@0.5 and mAP@0.5:0.95,respectively.Furthermore,relative to other models,such as YOLOv3,YOLOv5,YOLOv6,YOLOv8,YOLOv9,and YOLOv10,YOLOv8-SBI has a smaller parameter size of only 3.01×10^(6).This highlights the optimal balance between detection accuracy and speed,thereby offering robust technical support for intelligent mineral classification.
基金supported by the Jiangsu Provincial Key R&D Programme 261(BE2020116,BE2022154).
文摘Waveforms of artificially induced explosions and collapse events recorded by the seismic network share similarities with natural earthquakes.Failure to identify and screen them in a timely manner can introduce confusion into the earthquake catalog established using these recordings,thereby impacting future seismological research.Therefore,the identification and separation of natural earthquakes from continuous seismic signals contribute to the monitoring and early warning of destructive tectonic earthquakes.A 1D convolutional neural network(CNN)is proposed for seismic event classification using an efficient channel attention mechanism and an improved light inception block.A total of 9937 seismic sample records are obtained after waveform interception,filtering,and normalization.The proposed model can obtain better classification performance than other major existing methods,exhibiting 96.79%overall classification accuracy and 96.73%,94.85%,and 96.35%classification accuracy for natural seismic events,collapse events,and blasting events,respectively.Meanwhile,the proposed model is lighter than the 2D convolutional and common inception networks.We also apply the proposed model to the seismic data recorded at the University of Utah seismograph stations and compare its performance with that of the CNN-waveform model.
文摘With the development of artificial intelligence and deep learning,the attention mechanism has become a key technology for enhancing the performance of complex tasks.This paper reviews the evolution of attention mechanisms,including soft attention,hard attention,and recent innovations such as multi-head latent attention and cross-attention.It focuses on the latest research outcomes,such as lightning attention,the PADRe polynomial attention replacement algorithm,the context anchor attention module,and improvements in attention mechanisms for large models.These advancements improve the efficiency and accuracy of models,expanding the application potential of attention mechanisms in fields such as computer vision,natural language processing,and remote sensing object detection,aiming to provide readers with a comprehensive understanding and stimulate innovative thinking.
基金funded in part by theHenan ProvinceKeyR&DProgramProject,“Research and Application Demonstration of Class Ⅱ Superlattice Medium Wave High Temperature Infrared Detector Technology”under Grant No.231111210400.
文摘Infrared imaging technology has been widely adopted in various fields,such as military reconnaissance,medical diagnosis,and security monitoring,due to its excellent ability to penetrate smoke and fog.However,the prevalent low resolution of infrared images severely limits the accurate interpretation of their contents.In addition,deploying super-resolution models on resource-constrained devices faces significant challenges.To address these issues,this study proposes a lightweight super-resolution network for infrared images based on an adaptive attention mechanism.The network’s dynamic weighting module automatically adjusts the weights of the attention and nonattention branch outputs based on the network’s characteristics at different levels.Among them,the attention branch is further subdivided into pixel attention and brightness-texture attention,which are specialized for extracting the most informative features in infrared images.Meanwhile,the non-attention branch supplements the extraction of those neglected features to enhance the comprehensiveness of the features.Through ablation experiments,we verify the effectiveness of the proposed module.Finally,through experiments on two datasets,FLIR and Thermal101,qualitative and quantitative results demonstrate that the model can effectively recover high-frequency details of infrared images and significantly improve image resolution.In detail,compared with the suboptimal method,we have reduced the number of parameters by 30%and improved the model performance.When the scale factor is 2,the peak signal-tonoise ratio of the test datasets FLIR and Thermal101 is improved by 0.09 and 0.15 dB,respectively.When the scale factor is 4,it is improved by 0.05 and 0.09 dB,respectively.In addition,due to the lightweight design of the network structure,it has a low computational cost.It is suitable for deployment on edge devices,thus effectively enhancing the sensing performance of infrared imaging devices.
基金support from Natural Science Foundation of Hunan Province(Grant No.2024JJ8055)Hunan Yiduoyun Commodity Itelligence Project(Grant No.h2024-003).
文摘During the operation, maintenance and upkeep of concrete buildings, surface cracks are often regarded as important warning signs of potential damage. Their precise segmentation plays a key role in assessing the health of a building. Traditional manual inspection is subjective, inefficient and has safety hazards. In contrast, current mainstream computer vision–based crack segmentation methods still suffer from missed detections, false detections, and segmentation discontinuities. These problems are particularly evident when dealing with small cracks, complex backgrounds, and blurred boundaries. For this reason, this paper proposes a lightweight building surface crack segmentation method, HL-YOLO, based on YOLOv11n-seg, which integrates an attention mechanism and a dilation-wise residual structure. First, we design a lightweight backbone network, RCSAA-Net, which combines ResNet50, capable of multi-scale feature extraction, with a custom Channel-Spatial Aggregation Attention (CSAA) module. This design boosts the model’s capacity to extract features of fine cracks and complex backgrounds. Among them, the CSAA module enhances the model’s attention to critical crack areas by capturing global dependencies in feature maps. Secondly, we construct an enhanced Content-aware ReAssembly of FEatures (ProCARAFE) module. It introduces a larger receptive field and dynamic kernel generation mechanism to achieve the reconstruction and accurate restoration of crack edge details. Finally, a Dilation-wise Residual (DWR) structure is introduced to reconstruct the C3k2 modules in the neck. It enhances multi-scale feature extraction and long-range contextual information fusion capabilities through multi-rate depthwise dilated convolutions. The improved model’s superiority and generalization ability have been validated through experiments on the self-built dataset. Compared to the baseline model, HL-YOLO improves mean Average Precision at 0.5 IoU by 4.1%, and increases the mean Intersection over Union (mIoU) by 4.86%, with only 3.12 million parameters. These results indicate that HL-YOLO can efficiently and accurately identify cracks on building surfaces, meeting the demand for rapid detection and providing an effective technical solution for real-time crack monitoring.
基金supported by the Key Research and Development Program of Heilongjiang Province(No.2022ZX01A35).
文摘As the group-buying model shows significant progress in attracting new users,enhancing user engagement,and increasing platform profitability,providing personalized recommendations for group-buying users has emerged as a new challenge in the field of recommendation systems.This paper introduces a group-buying recommendation model based on multi-head attention mechanisms and multi-task learning,termed the Multi-head Attention Mechanisms and Multi-task Learning Group-Buying Recommendation(MAMGBR)model,specifically designed to optimize group-buying recommendations on e-commerce platforms.The core dataset of this study comes from the Chinese maternal and infant e-commerce platform“Beibei,”encompassing approximately 430,000 successful groupbuying actions and over 120,000 users.Themodel focuses on twomain tasks:recommending items for group organizers(Task Ⅰ)and recommending participants for a given group-buying event(Task Ⅱ).In model evaluation,MAMGBR achieves an MRR@10 of 0.7696 for Task I,marking a 20.23%improvement over baseline models.Furthermore,in Task II,where complex interaction patterns prevail,MAMGBR utilizes auxiliary loss functions to effectively model the multifaceted roles of users,items,and participants,leading to a 24.08%increase in MRR@100 under a 1:99 sample ratio.Experimental results show that compared to benchmark models,such as NGCF and EATNN,MAMGBR’s integration ofmulti-head attentionmechanisms,expert networks,and gating mechanisms enables more accurate modeling of user preferences and social associations within group-buying scenarios,significantly enhancing recommendation accuracy and platform group-buying success rates.
基金the National Natural Science Foundation of China(Nos.61775139,62072126,61772164,and 61872242)。
文摘Human pose estimation has received much attention from the research community because of its wide range of applications.However,current research for pose estimation is usually complex and computationally intensive,especially the feature loss problems in the feature fusion process.To address the above problems,we propose a lightweight human pose estimation network based on multi-attention mechanism(LMANet).In our method,network parameters can be significantly reduced by lightweighting the bottleneck blocks with depth-wise separable convolution on the high-resolution networks.After that,we also introduce a multi-attention mechanism to improve the model prediction accuracy,and the channel attention module is added in the initial stage of the network to enhance the local cross-channel information interaction.More importantly,we inject spatial crossawareness module in the multi-scale feature fusion stage to reduce the spatial information loss during feature extraction.Extensive experiments on COCO2017 dataset and MPII dataset show that LMANet can guarantee a higher prediction accuracy with fewer network parameters and computational effort.Compared with the highresolution network HRNet,the number of parameters and the computational complexity of the network are reduced by 67%and 73%,respectively.
基金supported by the Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)(NO.SML2021SP201)the National Natural Science Foundation of China(Grant No.42306200 and 42306216)+2 种基金the National Key Research and Development Program of China(Grant No.2023YFC3008100)the Innovation Group Project of the Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)(Grant No.311021004)the Oceanic Interdisciplinary Program of Shanghai Jiao Tong University(Project No.SL2021ZD203)。
文摘Tropical cyclones(TCs)are complex and powerful weather systems,and accurately forecasting their path,structure,and intensity remains a critical focus and challenge in meteorological research.In this paper,we propose an Attention Spatio-Temporal predictive Generative Adversarial Network(AST-GAN)model for predicting the temporal and spatial distribution of TCs.The model forecasts the spatial distribution of TC wind speeds for the next 15 hours at 3-hour intervals,emphasizing the cyclone's center,high wind-speed areas,and its asymmetric structure.To effectively capture spatiotemporal feature transfer at different time steps,we employ a channel attention mechanism for feature selection,enhancing model performance and reducing parameter redundancy.We utilized High-Resolution Weather Research and Forecasting(HWRF)data to train our model,allowing it to assimilate a wide range of TC motion patterns.The model is versatile and can be applied to various complex scenarios,such as multiple TCs moving simultaneously or TCs approaching landfall.Our proposed model demonstrates superior forecasting performance,achieving a root-mean-square error(RMSE)of 0.71 m s^(-1)for overall wind speed and 2.74 m s^(-1)for maximum wind speed when benchmarked against ground truth data from HWRF.Furthermore,the model underwent optimization and independent testing using ERA5reanalysis data,showcasing its stability and scalability.After fine-tuning on the ERA5 dataset,the model achieved an RMSE of 1.33 m s^(-1)for wind speed and 1.75 m s^(-1)for maximum wind speed.The AST-GAN model outperforms other state-of-the-art models in RMSE on both the HWRF and ERA5 datasets,maintaining its superior performance and demonstrating its effectiveness for spatiotemporal prediction of TCs.