In this paper,the small-signal modeling of the Indium Phosphide High Electron Mobility Transistor(InP HEMT)based on the Transformer neural network model is investigated.The AC S-parameters of the HEMT device are train...In this paper,the small-signal modeling of the Indium Phosphide High Electron Mobility Transistor(InP HEMT)based on the Transformer neural network model is investigated.The AC S-parameters of the HEMT device are trained and validated using the Transformer model.In the proposed model,the eight-layer transformer encoders are connected in series and the encoder layer of each Transformer consists of the multi-head attention layer and the feed-forward neural network layer.The experimental results show that the measured and modeled S-parameters of the HEMT device match well in the frequency range of 0.5-40 GHz,with the errors versus frequency less than 1%.Compared with other models,good accuracy can be achieved to verify the effectiveness of the proposed model.展开更多
Micro-expressions are spontaneous, unconscious movements that reveal true emotions.Accurate facial movement information and network training learning methods are crucial for micro-expression recognition.However, most ...Micro-expressions are spontaneous, unconscious movements that reveal true emotions.Accurate facial movement information and network training learning methods are crucial for micro-expression recognition.However, most existing micro-expression recognition technologies so far focus on modeling the single category of micro-expression images and neural network structure.Aiming at the problems of low recognition rate and weak model generalization ability in micro-expression recognition, a micro-expression recognition algorithm is proposed based on graph convolution network(GCN) and Transformer model.Firstly, action unit(AU) feature detection is extracted and facial muscle nodes in the neighborhood are divided into three subsets for recognition.Then, graph convolution layer is used to find the layout of dependencies between AU nodes of micro-expression classification.Finally, multiple attentional features of each facial action are enriched with Transformer model to include more sequence information before calculating the overall correlation of each region.The proposed method is validated in CASME II and CAS(ME)^2 datasets, and the recognition rate reached 69.85%.展开更多
The oil industries are an important part of a country’s economy.The crude oil’s price is influenced by a wide range of variables.Therefore,how accurately can countries predict its behavior and what predictors to emp...The oil industries are an important part of a country’s economy.The crude oil’s price is influenced by a wide range of variables.Therefore,how accurately can countries predict its behavior and what predictors to employ are two main questions.In this view,we propose utilizing deep learning and ensemble learning techniques to boost crude oil’s price forecasting performance.The suggested method is based on a deep learning snapshot ensemble method of the Transformer model.To examine the superiority of the proposed model,this paper compares the proposed deep learning ensemble model against different machine learning and statistical models for daily Organization of the Petroleum Exporting Countries(OPEC)oil price forecasting.Experimental results demonstrated the outperformance of the proposed method over statistical and machine learning methods.More precisely,the proposed snapshot ensemble of Transformer method achieved relative improvement in the forecasting performance compared to autoregressive integrated moving average ARIMA(1,1,1),ARIMA(0,1,1),autoregressive moving average(ARMA)(0,1),vector autoregression(VAR),random walk(RW),support vector machine(SVM),and random forests(RF)models by 99.94%,99.62%,99.87%,99.65%,7.55%,98.38%,and 99.35%,respectively,according to mean square error metric.展开更多
Accurate forecasting of buildings'energy demand is essential for building operators to manage loads and resources efficiently,and for grid operators to balance local production with demand.However,nowadays models ...Accurate forecasting of buildings'energy demand is essential for building operators to manage loads and resources efficiently,and for grid operators to balance local production with demand.However,nowadays models still struggle to capture nonlinear relationships influenced by external factors like weather and consumer behavior,assume constant variance in energy data over time,and often fail to model sequential data.To address these limitations,we propose a hybrid Transformer-based model with Liquid Neural Networks and learnable encodings for building energy forecasting.The model leverages Dense Layers to learn non-linear mappings to create embeddings that capture underlying patterns in time series energy data.Additionally,a Convolutional Neural Network encoder is integrated to enhance the model's ability to understand temporal dynamics through spatial mappings.To address the limitations of classic attention mechanisms,we implement a reservoir processing module using Liquid Neural Networks which introduces a controlled non-linearity through dynamic reservoir computing,enabling the model to capture complex patterns in the data.For model evaluation,we utilized both pilot data and state-of-the-art datasets to determine the model's performance across various building contexts,including large apartment and commercial buildings and small households,with and without on-site energy production.The proposed transformer model demonstrates good predictive accuracy and training time efficiency across various types of buildings and testing configurations.Specifically,SMAPE scores indicate a reduction in prediction error,with improvements ranging from 1.5%to 50%over basic transformer,LSTM and ANN models while the higher R²values further confirm the model's reliability in capturing energy time series variance.The 8%improvement in training time over the basic transformer model,highlights the hybrid model computational efficiency without compromising accuracy.展开更多
Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or...Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.展开更多
AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,com...AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.展开更多
The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search ...The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search strategies, i.e., when facing a large and complex search space, it is difficult to mine more effective architectures within a reasonable time, resulting in inferior search results. This research introduces the Generative Pre-trained Transformer NAS (GPT-NAS), an innovative approach designed to overcome the limitations which are inherent in traditional NAS strategies. This approach improves search efficiency and obtains better architectures by integrating GPT model into the search process. Specifically, we design a reconstruction strategy that utilizes the trained GPT to reorganize the architectures obtained from the search. In addition, to equip the GPT model with the design capabilities of neural architecture, we propose the use of the GPT model for training on a neural architecture dataset. For each architecture, the structural information of its previous layers is utilized to predict the next layer of structure, iteratively traversing the entire architecture. In this way, the GPT model can efficiently learn the key features required for neural architectures. Extensive experimental validation shows that our GPT-NAS approach beats both manually constructed neural architectures and automatically generated architectures by NAS. In addition, we validate the superiority of introducing the GPT model in several ways, and find that the accuracy of the neural architecture on the image dataset obtained from the search after introducing the GPT model is improved by up to about 9%.展开更多
Following triple La Nina events during 2020–2022,the future evolution of climate conditions over the tropical Pacific has been a focused interest in ENSO-related communities.Observations and modeling studies indicate...Following triple La Nina events during 2020–2022,the future evolution of climate conditions over the tropical Pacific has been a focused interest in ENSO-related communities.Observations and modeling studies indicate that an El Nino event is occurring in 2023;however,large uncertainties remain in terms of its detailed evolution,and the factors affecting its resultant amplitude remain to be understood.Here,a novel deep learning-based Transformer model is adopted to make real-time predictions for the 2023–2024 climate conditions in the tropical Pacific.Several key fields vital to the El Nino and Southern Oscillation(ENSO)in the tropical Pacific are collectively and simultaneously utilized in model training and in making predictions;therefore,this purely data-driven model is configured in both training and predicting procedures such that the coupled ocean-atmosphere interactions are adequately represented.Also similar to dynamic models,the prediction procedure is executed in a rolling manner to allow ocean-atmosphere anomaly exchanges month by month;the related key fields during multi-month time intervals(TIs)prior to prediction target months are taken as input predictors,serving as initial conditions to precondition the future evolution more effectively.Real-time predictions indicate that the climate conditions in the tropical Pacific are surely to develop into an El Nino state in late 2023.Furthermore,sensitivity experiments are conducted to examine how prediction skills are affected by the input predictor specifications,including TIs during which information on initial conditions is retained for making predictions.A comparison with other dynamic coupled models is also made to demonstrate the prediction performance for the 2023–2024 El Nino event.展开更多
In dynamic 5G network environments,user mobility and heterogeneous network topologies pose dual challenges to the effort of improving performance of mobile edge caching.Existing studies often overlook the dynamic natu...In dynamic 5G network environments,user mobility and heterogeneous network topologies pose dual challenges to the effort of improving performance of mobile edge caching.Existing studies often overlook the dynamic nature of user locations and the potential of device-to-device(D2D)cooperative caching,limiting the reduction of transmission latency.To address this issue,this paper proposes a joint optimization scheme for edge caching that integrates user mobility prediction with deep reinforcement learning.First,a Transformer-based geolocation prediction model is designed,leveraging multi-head attention mechanisms to capture correlations in historical user trajectories for accurate future location prediction.Then,within a three-tier heterogeneous network,we formulate a latency minimization problem under a D2D cooperative caching architecture and develop a mobility-aware Deep Q-Network(DQN)caching strategy.This strategy takes predicted location information as state input and dynamically adjusts the content distribution across small base stations(SBSs)andmobile users(MUs)to reduce end-to-end delay inmulti-hop content retrieval.Simulation results show that the proposed DQN-based method outperforms other baseline strategies across variousmetrics,achieving a 17.2%reduction in transmission delay compared to DQNmethods withoutmobility integration,thus validating the effectiveness of the joint optimization of location prediction and caching decisions.展开更多
The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between ne...The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between neutrons and samples to achieve the qualitative and quantitative detection of sample components.In this study,we present a novel method for identifying copper grade by combining the vision transformer(ViT)model with the PGNAA technique.First,a Monte Carlo simulation is employed to determine the optimal sizes of the neutron moderator,thermal neutron absorption material,and dimensions of the device.Subsequently,based on the parameters obtained through optimization,a PGNAA copper ore measurement model is established.The gamma spectrum of the copper ore is analyzed using the ViT model.The ViT model is optimized for hyperparameters using a grid search.To ensure the reliability of the identification results,the test results are obtained through five repeated tenfold cross-validations.Long short-term memory and convolutional neural network models are compared with the ViT method.These results indicate that the ViT method is efficient in identifying copper ore grades with average accuracy,precision,recall,F_(1)score,and F_(1)(-)score values of 0.9795,0.9637,0.9614,0.9625,and 0.9942,respectively.When identifying associated minerals,the ViT model can identify Pb,Zn,Fe,and Co minerals with identification accuracies of 0.9215,0.9396,0.9966,and 0.8311,respectively.展开更多
Enhancing low-light images with color distortion and uneven multi-light source distribution presents challenges. Most advanced methods for low-light image enhancement are based on the Retinex model using deep learning...Enhancing low-light images with color distortion and uneven multi-light source distribution presents challenges. Most advanced methods for low-light image enhancement are based on the Retinex model using deep learning. Retinexformer introduces channel self-attention mechanisms in the IG-MSA. However, it fails to effectively capture long-range spatial dependencies, leaving room for improvement. Based on the Retinexformer deep learning framework, we designed the Retinexformer+ network. The “+” signifies our advancements in extracting long-range spatial dependencies. We introduced multi-scale dilated convolutions in illumination estimation to expand the receptive field. These convolutions effectively capture the weakening semantic dependency between pixels as distance increases. In illumination restoration, we used Unet++ with multi-level skip connections to better integrate semantic information at different scales. The designed Illumination Fusion Dual Self-Attention (IF-DSA) module embeds multi-scale dilated convolutions to achieve spatial self-attention. This module captures long-range spatial semantic relationships within acceptable computational complexity. Experimental results on the Low-Light (LOL) dataset show that Retexformer+ outperforms other State-Of-The-Art (SOTA) methods in both quantitative and qualitative evaluations, with the computational complexity increased to an acceptable 51.63 G FLOPS. On the LOL_v1 dataset, RetinexFormer+ shows an increase of 1.15 in Peak Signal-to-Noise Ratio (PSNR) and a decrease of 0.39 in Root Mean Square Error (RMSE). On the LOL_v2_real dataset, the PSNR increases by 0.42 and the RMSE decreases by 0.18. Experimental results on the Exdark dataset show that Retexformer+ can effectively enhance real-scene images and maintain their semantic information.展开更多
The rise of social media platforms has revolutionized communication, enabling the exchange of vast amounts of data through text, audio, images, and videos. These platforms have become critical for sharing opinions and...The rise of social media platforms has revolutionized communication, enabling the exchange of vast amounts of data through text, audio, images, and videos. These platforms have become critical for sharing opinions and insights, influencing daily habits, and driving business, political, and economic decisions. Text posts are particularly significant, and natural language processing (NLP) has emerged as a powerful tool for analyzing such data. While traditional NLP methods have been effective for structured media, social media content poses unique challenges due to its informal and diverse nature. This has spurred the development of new techniques tailored for processing and extracting insights from unstructured user-generated text. One key application of NLP is the summarization of user comments to manage overwhelming content volumes. Abstractive summarization has proven highly effective in generating concise, human-like summaries, offering clear overviews of key themes and sentiments. This enhances understanding and engagement while reducing cognitive effort for users. For businesses, summarization provides actionable insights into customer preferences and feedback, enabling faster trend analysis, improved responsiveness, and strategic adaptability. By distilling complex data into manageable insights, summarization plays a vital role in improving user experiences and empowering informed decision-making in a data-driven landscape. This paper proposes a new implementation framework by fine-tuning and parameterizing Transformer Large Language Models to manage and maintain linguistic and semantic components in abstractive summary generation. The system excels in transforming large volumes of data into meaningful summaries, as evidenced by its strong performance across metrics like fluency, consistency, readability, and semantic coherence.展开更多
Model-based system-of-systems(SOS)engineering(MBSoSE)is becoming a promising solution for the design of SoS with increasing complexity.However,bridging the models from the design phase to the simulation phase poses si...Model-based system-of-systems(SOS)engineering(MBSoSE)is becoming a promising solution for the design of SoS with increasing complexity.However,bridging the models from the design phase to the simulation phase poses significant challenges and requires an integrated approach.In this study,a unified requirement modeling approach is proposed based on unified architecture framework(UAF).Theoretical models are proposed which compose formalized descriptions from both topdown and bottom-up perspectives.Based on the description,the UAF profile is proposed to represent the SoS mission and constituent systems(CS)goal.Moreover,the agent-based simulation information is also described based on the overview,design concepts,and details(ODD)protocol as the complement part of the SoS profile,which can be transformed into different simulation platforms based on the eXtensible markup language(XML)technology and model-to-text method.In this way,the design of the SoS is simulated automatically in the early design stage.Finally,the method is implemented and an example is given to illustrate the whole process.展开更多
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir...Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.展开更多
The dependence of transformer performance on the material properties was investigated using two laboratory-processed 0.23 mm thick grain-oriented electrical steels domain-refined with elec-trolytically etched grooves ...The dependence of transformer performance on the material properties was investigated using two laboratory-processed 0.23 mm thick grain-oriented electrical steels domain-refined with elec-trolytically etched grooves having different magnetic properties. The iron loss at 1.7 T, 50 Hz and the flux density at 800 A/m of material A were 0.73 W/kg and 1.89 T, respectively; and those of material B, 0.83 W/kg and 1.88 T. Model stacked and wound transformer core experiments using the tested materials exhibited performance well reflecting the material characteristics. In a three-phase stacked core with step-lap joints excited to 1.7 T, 50 Hz, the core loss, the exciting current and the noise level were 0.86 W/kg, 0.74 A and 52 dB, respectively, with material A; and 0.97 W/kg, 1.0 A and 54 dB with material B. The building factors for the core losses of the two materials were almost the same in both core configurations. The effect of higher harmonics on transformer performance was also investigated.展开更多
Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India.The video obtained from such surveillance are of low quality.Still counting vehicles from such videos are necess...Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India.The video obtained from such surveillance are of low quality.Still counting vehicles from such videos are necessity to avoid traf-fic congestion and allows drivers to plan their routes more precisely.On the other hand,detecting vehicles from such low quality videos are highly challenging with vision based methodologies.In this research a meticulous attempt is made to access low-quality videos to describe traffic in Salem town in India,which is mostly an un-attempted entity by most available sources.In this work profound Detection Transformer(DETR)model is used for object(vehicle)detection.Here vehicles are anticipated in a rush-hour traffic video using a set of loss functions that carry out bipartite coordinating among estimated and information acquired on real attributes.Every frame in the traffic footage has its date and time which is detected and retrieved using Tesseract Optical Character Recognition.The date and time extricated and perceived from the input image are incorporated with the length of the recognized objects acquired from the DETR model.This furnishes the vehicles report with timestamp.Transformer Timeseries Prediction Model(TTPM)is proposed to predict the density of the vehicle for future prediction,here the regular NLP layers have been removed and the encoding temporal layer has been modified.The proposed TTPM error rate outperforms the existing models with RMSE of 4.313 and MAE of 3.812.展开更多
Transformers are widely distributed and extremely important energy conversion equipment in the power system.However,the high altitude electromagnetic pulse(HEMP)can induce a steep pulse voltage with a peak value of up...Transformers are widely distributed and extremely important energy conversion equipment in the power system.However,the high altitude electromagnetic pulse(HEMP)can induce a steep pulse voltage with a peak value of up to thousands of kilovolts on the overhead line,and the frequency is up to 100 MHz,which is easy to damage the transformer and seriously threaten the safe and stable operation of the power system.Therefore,to protect the safety of the power system,analysing the transformer wave process under the action of HEMP is necessary.Accordingly,there is a need for an ultrawide band model of the transformer.In this paper,the skin effect of winding is analysed,the dielectric response of oil-paper is fitted with the double relaxation Cole-Cole model,and the frequency dependent models of distribution parameters are established.Fractional order terms are introduced into the traditional multi-conductor transmission line(MTL)integer order model to describe the characteristics of these distribution parameters so that the frequency application range of the MTL model is extended to 100 MHz.The proposed MTL model is solved by the finite-difference time-domain algorithm and verified by the finite element method and experiment.The error is less than 5%,which verifies its accuracy and effectiveness.展开更多
The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack...The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.展开更多
This study proposes a virtual healthcare assistant framework designed to provide support in multiple languages for efficient and accurate healthcare assistance.The system employs a transformer model to process sophist...This study proposes a virtual healthcare assistant framework designed to provide support in multiple languages for efficient and accurate healthcare assistance.The system employs a transformer model to process sophisticated,multilingual user inputs and gain improved contextual understanding compared to conventional models,including long short-term memory(LSTM)models.In contrast to LSTMs,which sequence processes information and may experience challenges with long-range dependencies,transformers utilize self-attention to learn relationships among every aspect of the input in parallel.This enables them to execute more accurately in various languages and contexts,making them well-suited for applications such as translation,summarization,and conversational Comparative evaluations revealed the superiority of the transformer model(accuracy rate:85%)compared with that of the LSTM model(accuracy rate:65%).The experiments revealed several advantages of the transformer architecture over the LSTM model,such as more effective self-attention,the ability for models to work in parallel with each other,and contextual understanding for better multilingual compatibility.Additionally,our prediction model exhibited effectiveness for disease diagnosis,with accuracy of 85%or greater in identifying the relationship between symptoms and diseases among different demographics.The system provides translation support from English to other languages,with conversion to French(Bilingual Evaluation Understudy score:0.7),followed by English to Hindi(0.6).The lowest Bilingual Evaluation Understudy score was found for English to Telugu(0.39).This virtual assistant can also perform symptom analysis and disease prediction,with output given in the preferred language of the user.展开更多
Automated and accurate movie genre classification is crucial for content organization,recommendation systems,and audience targeting in the film industry.Although most existing approaches focus on audiovisual features ...Automated and accurate movie genre classification is crucial for content organization,recommendation systems,and audience targeting in the film industry.Although most existing approaches focus on audiovisual features such as trailers and posters,the text-based classification remains underexplored despite its accessibility and semantic richness.This paper introduces the Genre Attention Model(GAM),a deep learning architecture that integrates transformer models with a hierarchical attention mechanism to extract and leverage contextual information from movie plots formulti-label genre classification.In order to assess its effectiveness,we assessmultiple transformer-based models,including Bidirectional Encoder Representations fromTransformers(BERT),ALite BERT(ALBERT),Distilled BERT(DistilBERT),Robustly Optimized BERT Pretraining Approach(RoBERTa),Efficiently Learning an Encoder that Classifies Token Replacements Accurately(ELECTRA),eXtreme Learning Network(XLNet)and Decodingenhanced BERT with Disentangled Attention(DeBERTa).Experimental results demonstrate the superior performance of DeBERTa-based GAM,which employs a two-tier hierarchical attention mechanism:word-level attention highlights key terms,while sentence-level attention captures critical narrative segments,ensuring a refined and interpretable representation of movie plots.Evaluated on three benchmark datasets Trailers12K,Large Movie Trailer Dataset-9(LMTD-9),and MovieLens37K.GAM achieves micro-average precision scores of 83.63%,83.32%,and 83.34%,respectively,surpassing state-of-the-artmodels.Additionally,GAMis computationally efficient,requiring just 6.10Giga Floating Point Operations Per Second(GFLOPS),making it a scalable and cost-effective solution.These results highlight the growing potential of text-based deep learning models in genre classification and GAM’s effectiveness in improving predictive accuracy while maintaining computational efficiency.With its robust performance,GAM offers a versatile and scalable framework for content recommendation,film indexing,and media analytics,providing an interpretable alternative to traditional audiovisual-based classification techniques.展开更多
基金Supported by the National Natural Science Foundation of China(62201293,62034003)the Open-Foundation of State Key Laboratory of Millimeter-Waves(K202313)the Jiangsu Province Youth Science and Technology Talent Support Project(JSTJ-2024-040)。
文摘In this paper,the small-signal modeling of the Indium Phosphide High Electron Mobility Transistor(InP HEMT)based on the Transformer neural network model is investigated.The AC S-parameters of the HEMT device are trained and validated using the Transformer model.In the proposed model,the eight-layer transformer encoders are connected in series and the encoder layer of each Transformer consists of the multi-head attention layer and the feed-forward neural network layer.The experimental results show that the measured and modeled S-parameters of the HEMT device match well in the frequency range of 0.5-40 GHz,with the errors versus frequency less than 1%.Compared with other models,good accuracy can be achieved to verify the effectiveness of the proposed model.
基金Supported by Shaanxi Province Key Research and Development Project (2021GY-280)the National Natural Science Foundation of China (No.61834005,61772417,61802304)。
文摘Micro-expressions are spontaneous, unconscious movements that reveal true emotions.Accurate facial movement information and network training learning methods are crucial for micro-expression recognition.However, most existing micro-expression recognition technologies so far focus on modeling the single category of micro-expression images and neural network structure.Aiming at the problems of low recognition rate and weak model generalization ability in micro-expression recognition, a micro-expression recognition algorithm is proposed based on graph convolution network(GCN) and Transformer model.Firstly, action unit(AU) feature detection is extracted and facial muscle nodes in the neighborhood are divided into three subsets for recognition.Then, graph convolution layer is used to find the layout of dependencies between AU nodes of micro-expression classification.Finally, multiple attentional features of each facial action are enriched with Transformer model to include more sequence information before calculating the overall correlation of each region.The proposed method is validated in CASME II and CAS(ME)^2 datasets, and the recognition rate reached 69.85%.
文摘The oil industries are an important part of a country’s economy.The crude oil’s price is influenced by a wide range of variables.Therefore,how accurately can countries predict its behavior and what predictors to employ are two main questions.In this view,we propose utilizing deep learning and ensemble learning techniques to boost crude oil’s price forecasting performance.The suggested method is based on a deep learning snapshot ensemble method of the Transformer model.To examine the superiority of the proposed model,this paper compares the proposed deep learning ensemble model against different machine learning and statistical models for daily Organization of the Petroleum Exporting Countries(OPEC)oil price forecasting.Experimental results demonstrated the outperformance of the proposed method over statistical and machine learning methods.More precisely,the proposed snapshot ensemble of Transformer method achieved relative improvement in the forecasting performance compared to autoregressive integrated moving average ARIMA(1,1,1),ARIMA(0,1,1),autoregressive moving average(ARMA)(0,1),vector autoregression(VAR),random walk(RW),support vector machine(SVM),and random forests(RF)models by 99.94%,99.62%,99.87%,99.65%,7.55%,98.38%,and 99.35%,respectively,according to mean square error metric.
基金the DEDALUS project grant number 101103998 funded by the European Commission as part of the Horizon Europe Framework Programme and within Ministry of Research,Innovation and Digitization,CNCS/CCCDI-UEFISCDI,project number PN-IV-P8-8.1-PRE-HE-ORG-2023-0111,within PNCDI IV.
文摘Accurate forecasting of buildings'energy demand is essential for building operators to manage loads and resources efficiently,and for grid operators to balance local production with demand.However,nowadays models still struggle to capture nonlinear relationships influenced by external factors like weather and consumer behavior,assume constant variance in energy data over time,and often fail to model sequential data.To address these limitations,we propose a hybrid Transformer-based model with Liquid Neural Networks and learnable encodings for building energy forecasting.The model leverages Dense Layers to learn non-linear mappings to create embeddings that capture underlying patterns in time series energy data.Additionally,a Convolutional Neural Network encoder is integrated to enhance the model's ability to understand temporal dynamics through spatial mappings.To address the limitations of classic attention mechanisms,we implement a reservoir processing module using Liquid Neural Networks which introduces a controlled non-linearity through dynamic reservoir computing,enabling the model to capture complex patterns in the data.For model evaluation,we utilized both pilot data and state-of-the-art datasets to determine the model's performance across various building contexts,including large apartment and commercial buildings and small households,with and without on-site energy production.The proposed transformer model demonstrates good predictive accuracy and training time efficiency across various types of buildings and testing configurations.Specifically,SMAPE scores indicate a reduction in prediction error,with improvements ranging from 1.5%to 50%over basic transformer,LSTM and ANN models while the higher R²values further confirm the model's reliability in capturing energy time series variance.The 8%improvement in training time over the basic transformer model,highlights the hybrid model computational efficiency without compromising accuracy.
基金funded by Scientific Research Deanship at University of Hail-Saudi Arabia through Project Number RG-23092.
文摘Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.
基金supported by the Key Project of International Cooperation of Qilu University of Technology(Grant No.:QLUTGJHZ2018008)Shandong Provincial Natural Science Foundation Committee,China(Grant No.:ZR2016HB54)Shandong Provincial Key Laboratory of Microbial Engineering(SME).
文摘AlphaPanda(AlphaFold2[1]inspired protein-specific antibody design in a diffusional manner)is an advanced algorithm for designing complementary determining regions(CDRs)of the antibody targeted the specific epitope,combining transformer[2]models,3DCNN[3],and diffusion[4]generative models.
基金supported by the National Nature Science Foundation of China(No.62106161)the Fundamental Research Funds for the Central Universities(No.1082204112364)+4 种基金the Sichuan University Luzhou Municipal Government Strategic Cooperation Project(No.2022CDLZ-8)the Key R&D Program of Sichuan Province(Nos.2022YFN0017 and 2023YFG0019)the Natural Science Foundation of Sichuan(No.2023NSFSC0474)the Tianfiu Yongxing Laboratory Organized Research Project Funding(No.2023CXXM14)the Digital Media Art,Key Laboratory of Sichuan Province,Sichuan Conservatory of Music(No.22DMAKL04).
文摘The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search strategies, i.e., when facing a large and complex search space, it is difficult to mine more effective architectures within a reasonable time, resulting in inferior search results. This research introduces the Generative Pre-trained Transformer NAS (GPT-NAS), an innovative approach designed to overcome the limitations which are inherent in traditional NAS strategies. This approach improves search efficiency and obtains better architectures by integrating GPT model into the search process. Specifically, we design a reconstruction strategy that utilizes the trained GPT to reorganize the architectures obtained from the search. In addition, to equip the GPT model with the design capabilities of neural architecture, we propose the use of the GPT model for training on a neural architecture dataset. For each architecture, the structural information of its previous layers is utilized to predict the next layer of structure, iteratively traversing the entire architecture. In this way, the GPT model can efficiently learn the key features required for neural architectures. Extensive experimental validation shows that our GPT-NAS approach beats both manually constructed neural architectures and automatically generated architectures by NAS. In addition, we validate the superiority of introducing the GPT model in several ways, and find that the accuracy of the neural architecture on the image dataset obtained from the search after introducing the GPT model is improved by up to about 9%.
基金supported by the Laoshan Laboratory(Grant No.LSKJ202202402)the National Natural Science Foundation of China(Grant Nos.42030410&42176032)+2 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB40000000)the Startup Foundation for Introducing Talent of NUISTthe Jiangsu Innovation Research Group(Grant No.JSSCTD202346)。
文摘Following triple La Nina events during 2020–2022,the future evolution of climate conditions over the tropical Pacific has been a focused interest in ENSO-related communities.Observations and modeling studies indicate that an El Nino event is occurring in 2023;however,large uncertainties remain in terms of its detailed evolution,and the factors affecting its resultant amplitude remain to be understood.Here,a novel deep learning-based Transformer model is adopted to make real-time predictions for the 2023–2024 climate conditions in the tropical Pacific.Several key fields vital to the El Nino and Southern Oscillation(ENSO)in the tropical Pacific are collectively and simultaneously utilized in model training and in making predictions;therefore,this purely data-driven model is configured in both training and predicting procedures such that the coupled ocean-atmosphere interactions are adequately represented.Also similar to dynamic models,the prediction procedure is executed in a rolling manner to allow ocean-atmosphere anomaly exchanges month by month;the related key fields during multi-month time intervals(TIs)prior to prediction target months are taken as input predictors,serving as initial conditions to precondition the future evolution more effectively.Real-time predictions indicate that the climate conditions in the tropical Pacific are surely to develop into an El Nino state in late 2023.Furthermore,sensitivity experiments are conducted to examine how prediction skills are affected by the input predictor specifications,including TIs during which information on initial conditions is retained for making predictions.A comparison with other dynamic coupled models is also made to demonstrate the prediction performance for the 2023–2024 El Nino event.
基金supported by the Liaoning Provincial Education Department Fund,grant number JYTZD2023083.
文摘In dynamic 5G network environments,user mobility and heterogeneous network topologies pose dual challenges to the effort of improving performance of mobile edge caching.Existing studies often overlook the dynamic nature of user locations and the potential of device-to-device(D2D)cooperative caching,limiting the reduction of transmission latency.To address this issue,this paper proposes a joint optimization scheme for edge caching that integrates user mobility prediction with deep reinforcement learning.First,a Transformer-based geolocation prediction model is designed,leveraging multi-head attention mechanisms to capture correlations in historical user trajectories for accurate future location prediction.Then,within a three-tier heterogeneous network,we formulate a latency minimization problem under a D2D cooperative caching architecture and develop a mobility-aware Deep Q-Network(DQN)caching strategy.This strategy takes predicted location information as state input and dynamically adjusts the content distribution across small base stations(SBSs)andmobile users(MUs)to reduce end-to-end delay inmulti-hop content retrieval.Simulation results show that the proposed DQN-based method outperforms other baseline strategies across variousmetrics,achieving a 17.2%reduction in transmission delay compared to DQNmethods withoutmobility integration,thus validating the effectiveness of the joint optimization of location prediction and caching decisions.
基金supported by the National Natural Science Foundation of China(Nos.U2BB2077 and 42374226)the Natural Science Foundation of Jiangxi Province(20232BAB201043 and 20232BCJ23006)the Nuclear energy development project of the National Defense Science and Industry Bureau(Nos.20201192-01,20201192-03).
文摘The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between neutrons and samples to achieve the qualitative and quantitative detection of sample components.In this study,we present a novel method for identifying copper grade by combining the vision transformer(ViT)model with the PGNAA technique.First,a Monte Carlo simulation is employed to determine the optimal sizes of the neutron moderator,thermal neutron absorption material,and dimensions of the device.Subsequently,based on the parameters obtained through optimization,a PGNAA copper ore measurement model is established.The gamma spectrum of the copper ore is analyzed using the ViT model.The ViT model is optimized for hyperparameters using a grid search.To ensure the reliability of the identification results,the test results are obtained through five repeated tenfold cross-validations.Long short-term memory and convolutional neural network models are compared with the ViT method.These results indicate that the ViT method is efficient in identifying copper ore grades with average accuracy,precision,recall,F_(1)score,and F_(1)(-)score values of 0.9795,0.9637,0.9614,0.9625,and 0.9942,respectively.When identifying associated minerals,the ViT model can identify Pb,Zn,Fe,and Co minerals with identification accuracies of 0.9215,0.9396,0.9966,and 0.8311,respectively.
基金supported by the Key Laboratory of Forensic Science and Technology at College of Sichuan Province(2023YB04).
文摘Enhancing low-light images with color distortion and uneven multi-light source distribution presents challenges. Most advanced methods for low-light image enhancement are based on the Retinex model using deep learning. Retinexformer introduces channel self-attention mechanisms in the IG-MSA. However, it fails to effectively capture long-range spatial dependencies, leaving room for improvement. Based on the Retinexformer deep learning framework, we designed the Retinexformer+ network. The “+” signifies our advancements in extracting long-range spatial dependencies. We introduced multi-scale dilated convolutions in illumination estimation to expand the receptive field. These convolutions effectively capture the weakening semantic dependency between pixels as distance increases. In illumination restoration, we used Unet++ with multi-level skip connections to better integrate semantic information at different scales. The designed Illumination Fusion Dual Self-Attention (IF-DSA) module embeds multi-scale dilated convolutions to achieve spatial self-attention. This module captures long-range spatial semantic relationships within acceptable computational complexity. Experimental results on the Low-Light (LOL) dataset show that Retexformer+ outperforms other State-Of-The-Art (SOTA) methods in both quantitative and qualitative evaluations, with the computational complexity increased to an acceptable 51.63 G FLOPS. On the LOL_v1 dataset, RetinexFormer+ shows an increase of 1.15 in Peak Signal-to-Noise Ratio (PSNR) and a decrease of 0.39 in Root Mean Square Error (RMSE). On the LOL_v2_real dataset, the PSNR increases by 0.42 and the RMSE decreases by 0.18. Experimental results on the Exdark dataset show that Retexformer+ can effectively enhance real-scene images and maintain their semantic information.
文摘The rise of social media platforms has revolutionized communication, enabling the exchange of vast amounts of data through text, audio, images, and videos. These platforms have become critical for sharing opinions and insights, influencing daily habits, and driving business, political, and economic decisions. Text posts are particularly significant, and natural language processing (NLP) has emerged as a powerful tool for analyzing such data. While traditional NLP methods have been effective for structured media, social media content poses unique challenges due to its informal and diverse nature. This has spurred the development of new techniques tailored for processing and extracting insights from unstructured user-generated text. One key application of NLP is the summarization of user comments to manage overwhelming content volumes. Abstractive summarization has proven highly effective in generating concise, human-like summaries, offering clear overviews of key themes and sentiments. This enhances understanding and engagement while reducing cognitive effort for users. For businesses, summarization provides actionable insights into customer preferences and feedback, enabling faster trend analysis, improved responsiveness, and strategic adaptability. By distilling complex data into manageable insights, summarization plays a vital role in improving user experiences and empowering informed decision-making in a data-driven landscape. This paper proposes a new implementation framework by fine-tuning and parameterizing Transformer Large Language Models to manage and maintain linguistic and semantic components in abstractive summary generation. The system excels in transforming large volumes of data into meaningful summaries, as evidenced by its strong performance across metrics like fluency, consistency, readability, and semantic coherence.
基金Fifth Electronic Research Institute of the Ministry of Industry and Information Technology(HK07202200877)Pre-research Project on Civil Aerospace Technologies of CNSA(D020101)+2 种基金Zhejiang Provincial Science and Technology Plan Project(2022C01052)Frontier Scientific Research Program of Deep Space Exploration Laboratory(2022-QYKYJHHXYF-018,2022-QYKYJH-GCXD-001)Zhiyuan Laboratory(ZYL2024001)。
文摘Model-based system-of-systems(SOS)engineering(MBSoSE)is becoming a promising solution for the design of SoS with increasing complexity.However,bridging the models from the design phase to the simulation phase poses significant challenges and requires an integrated approach.In this study,a unified requirement modeling approach is proposed based on unified architecture framework(UAF).Theoretical models are proposed which compose formalized descriptions from both topdown and bottom-up perspectives.Based on the description,the UAF profile is proposed to represent the SoS mission and constituent systems(CS)goal.Moreover,the agent-based simulation information is also described based on the overview,design concepts,and details(ODD)protocol as the complement part of the SoS profile,which can be transformed into different simulation platforms based on the eXtensible markup language(XML)technology and model-to-text method.In this way,the design of the SoS is simulated automatically in the early design stage.Finally,the method is implemented and an example is given to illustrate the whole process.
文摘Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.
文摘The dependence of transformer performance on the material properties was investigated using two laboratory-processed 0.23 mm thick grain-oriented electrical steels domain-refined with elec-trolytically etched grooves having different magnetic properties. The iron loss at 1.7 T, 50 Hz and the flux density at 800 A/m of material A were 0.73 W/kg and 1.89 T, respectively; and those of material B, 0.83 W/kg and 1.88 T. Model stacked and wound transformer core experiments using the tested materials exhibited performance well reflecting the material characteristics. In a three-phase stacked core with step-lap joints excited to 1.7 T, 50 Hz, the core loss, the exciting current and the noise level were 0.86 W/kg, 0.74 A and 52 dB, respectively, with material A; and 0.97 W/kg, 1.0 A and 54 dB with material B. The building factors for the core losses of the two materials were almost the same in both core configurations. The effect of higher harmonics on transformer performance was also investigated.
文摘Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India.The video obtained from such surveillance are of low quality.Still counting vehicles from such videos are necessity to avoid traf-fic congestion and allows drivers to plan their routes more precisely.On the other hand,detecting vehicles from such low quality videos are highly challenging with vision based methodologies.In this research a meticulous attempt is made to access low-quality videos to describe traffic in Salem town in India,which is mostly an un-attempted entity by most available sources.In this work profound Detection Transformer(DETR)model is used for object(vehicle)detection.Here vehicles are anticipated in a rush-hour traffic video using a set of loss functions that carry out bipartite coordinating among estimated and information acquired on real attributes.Every frame in the traffic footage has its date and time which is detected and retrieved using Tesseract Optical Character Recognition.The date and time extricated and perceived from the input image are incorporated with the length of the recognized objects acquired from the DETR model.This furnishes the vehicles report with timestamp.Transformer Timeseries Prediction Model(TTPM)is proposed to predict the density of the vehicle for future prediction,here the regular NLP layers have been removed and the encoding temporal layer has been modified.The proposed TTPM error rate outperforms the existing models with RMSE of 4.313 and MAE of 3.812.
基金Energy Security Technology Project of Huaneng Group Headquarters Science and Technology Project,Grant/Award Number:HNKJ20-H87。
文摘Transformers are widely distributed and extremely important energy conversion equipment in the power system.However,the high altitude electromagnetic pulse(HEMP)can induce a steep pulse voltage with a peak value of up to thousands of kilovolts on the overhead line,and the frequency is up to 100 MHz,which is easy to damage the transformer and seriously threaten the safe and stable operation of the power system.Therefore,to protect the safety of the power system,analysing the transformer wave process under the action of HEMP is necessary.Accordingly,there is a need for an ultrawide band model of the transformer.In this paper,the skin effect of winding is analysed,the dielectric response of oil-paper is fitted with the double relaxation Cole-Cole model,and the frequency dependent models of distribution parameters are established.Fractional order terms are introduced into the traditional multi-conductor transmission line(MTL)integer order model to describe the characteristics of these distribution parameters so that the frequency application range of the MTL model is extended to 100 MHz.The proposed MTL model is solved by the finite-difference time-domain algorithm and verified by the finite element method and experiment.The error is less than 5%,which verifies its accuracy and effectiveness.
基金supported by National Natural Science Foundation of China(No.52374155)Anhui Provincial Natural Science Foundation(No.2308085 MF218).
文摘The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.
文摘This study proposes a virtual healthcare assistant framework designed to provide support in multiple languages for efficient and accurate healthcare assistance.The system employs a transformer model to process sophisticated,multilingual user inputs and gain improved contextual understanding compared to conventional models,including long short-term memory(LSTM)models.In contrast to LSTMs,which sequence processes information and may experience challenges with long-range dependencies,transformers utilize self-attention to learn relationships among every aspect of the input in parallel.This enables them to execute more accurately in various languages and contexts,making them well-suited for applications such as translation,summarization,and conversational Comparative evaluations revealed the superiority of the transformer model(accuracy rate:85%)compared with that of the LSTM model(accuracy rate:65%).The experiments revealed several advantages of the transformer architecture over the LSTM model,such as more effective self-attention,the ability for models to work in parallel with each other,and contextual understanding for better multilingual compatibility.Additionally,our prediction model exhibited effectiveness for disease diagnosis,with accuracy of 85%or greater in identifying the relationship between symptoms and diseases among different demographics.The system provides translation support from English to other languages,with conversion to French(Bilingual Evaluation Understudy score:0.7),followed by English to Hindi(0.6).The lowest Bilingual Evaluation Understudy score was found for English to Telugu(0.39).This virtual assistant can also perform symptom analysis and disease prediction,with output given in the preferred language of the user.
基金would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support(QU-APC-2025).
文摘Automated and accurate movie genre classification is crucial for content organization,recommendation systems,and audience targeting in the film industry.Although most existing approaches focus on audiovisual features such as trailers and posters,the text-based classification remains underexplored despite its accessibility and semantic richness.This paper introduces the Genre Attention Model(GAM),a deep learning architecture that integrates transformer models with a hierarchical attention mechanism to extract and leverage contextual information from movie plots formulti-label genre classification.In order to assess its effectiveness,we assessmultiple transformer-based models,including Bidirectional Encoder Representations fromTransformers(BERT),ALite BERT(ALBERT),Distilled BERT(DistilBERT),Robustly Optimized BERT Pretraining Approach(RoBERTa),Efficiently Learning an Encoder that Classifies Token Replacements Accurately(ELECTRA),eXtreme Learning Network(XLNet)and Decodingenhanced BERT with Disentangled Attention(DeBERTa).Experimental results demonstrate the superior performance of DeBERTa-based GAM,which employs a two-tier hierarchical attention mechanism:word-level attention highlights key terms,while sentence-level attention captures critical narrative segments,ensuring a refined and interpretable representation of movie plots.Evaluated on three benchmark datasets Trailers12K,Large Movie Trailer Dataset-9(LMTD-9),and MovieLens37K.GAM achieves micro-average precision scores of 83.63%,83.32%,and 83.34%,respectively,surpassing state-of-the-artmodels.Additionally,GAMis computationally efficient,requiring just 6.10Giga Floating Point Operations Per Second(GFLOPS),making it a scalable and cost-effective solution.These results highlight the growing potential of text-based deep learning models in genre classification and GAM’s effectiveness in improving predictive accuracy while maintaining computational efficiency.With its robust performance,GAM offers a versatile and scalable framework for content recommendation,film indexing,and media analytics,providing an interpretable alternative to traditional audiovisual-based classification techniques.