In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and rec...In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and recognition accuracy on resource-constrained devices such as mobile terminals remains a major challenge.To address this,we propose a novel time-frequency dual-branch parallel residual network,which integrates a Dual-Branch Broadcast Residual module and a Time-Frequency Coordinate Attention module.The time-domain and frequency-domain branches are designed in parallel to independently extract temporal and spectral features,effectively avoiding the potential information loss caused by serial stacking,while enhancing information flow and multi-scale feature fusion.In terms of training strategy,a curriculum learning approach is introduced to progressively improve model robustness fromeasy to difficult tasks.Experimental results demonstrate that the proposed method consistently outperforms existing lightweight models under various signal-to-noise ratio(SNR)conditions,achieving superior far-field recognition performance on the Google Speech Commands V2 dataset.Notably,the model maintains stable performance even in low-SNR environments such as–10 dB,and generalizes well to unseen SNR conditions during training,validating its robustness to novel noise scenarios.Furthermore,the proposed model exhibits significantly fewer parameters,making it highly suitable for deployment on resource-limited devices.Overall,the model achieves a favorable balance between performance and parameter efficiency,demonstrating strong potential for practical applications.展开更多
随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法...随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。展开更多
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM...To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.展开更多
Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conductin...Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.展开更多
Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the b...Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.展开更多
Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contempo...Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contemporary enterprises typically operate 200+interconnected systems,with research indicating that 52% of organizations manage three or more enterprise content management systems,creating information silos that reduce operational efficiency by up to 35%.While attention mechanisms have demonstrated remarkable success in natural language processing and computer vision,their systematic application to business information systems remains largely unexplored.This paper presents the theoretical foundation for a Hierarchical Attention-Based Business Information System(HABIS)framework that applies multi-level attention mechanisms to enterprise environments.We provide a comprehensive mathematical formulation of the framework,analyze its computational complexity,and present a proof-of-concept implementation with simulation-based validation that demonstrates a 42% reduction in crosssystem query latency compared to legacy ERP modules and 70% improvement in prediction accuracy over baseline methods.The theoretical framework introduces four hierarchical attention levels:system-level attention for dynamic weighting of business systems,process-level attention for business process prioritization,data-level attention for critical information selection,and temporal attention for time-sensitive pattern recognition.Our complexity analysis demonstrates that the framework achieves O(n log n)computational complexity for attention computation,making it scalable to large enterprise environments including retail supply chains with 200+system-scale deployments.The proof-of-concept implementation validates the theoretical framework’s feasibility withMSE loss of 0.439 and response times of 0.000120 s per query,demonstrating its potential for addressing key challenges in business information systems.This work establishes a foundation for future empirical research and practical implementation of attention-driven enterprise systems.展开更多
Understanding how rock slopes respond to blasting loads is crucial for maintaining excavation safety and slope stability.Nevertheless,the spatiotemporal evolution,nonlinear dependence on blasting parameters,and predic...Understanding how rock slopes respond to blasting loads is crucial for maintaining excavation safety and slope stability.Nevertheless,the spatiotemporal evolution,nonlinear dependence on blasting parameters,and predictive behavior of dominant frequency responses in slope vibrations remain insufficiently understood and quantified.This study combines time-frequency analysis with machine learning to explore how the dominant frequency(f_(d))evolves in slopes under blasting.Continuous Wavelet Transform(CWT)was employed to characterize the temporal-frequency evolution of vibration signals,revealing that the dominant frequency exhibits strong spatial dependence and nonlinear variability influenced by blasting parameters and rock mass structures.Three machine learning models,namely Back Propagation Neural Network(BP),Support Vector Machine(SVM),and Random Forest(RF),were developed to predict f_(d) based on 1,000 monitoring samples obtained from numerical and field simulations.Among them,the RF model achieved the highest prediction accuracy,with mean absolute percentage errors(MAPE)below 15%,demonstrating strong robustness and generalization capability.Our analysis shows that external excitation factors,especially the loading frequency(f_(d)),mainly control the frequency response,while internal controlling factors,such as spatial position,lithological variation,and mechanical heterogeneity,modulate localized frequency amplification and energy redistribution.The results reveal that f_(d) tends to decrease with elevation and distance from the blasting source,whereas structural planes and weathered zones induce high-frequency amplification due to scattering and modal coupling effects.This study offers a new framework combining time-frequency analysis and machine learning to measure the nonlinear interaction between blasting and rock mass response,offering new insights for dynamic stability evaluation and hazard mitigation in complex rock slope systems.展开更多
The state-of-the-art optical atomic clocks and the time-frequency signal transmission open a fresh field for gravity potential(geopotential)determination.Various methods,including optical fiber frequency transfer,sate...The state-of-the-art optical atomic clocks and the time-frequency signal transmission open a fresh field for gravity potential(geopotential)determination.Various methods,including optical fiber frequency transfer,satellite two-way,satellite common-view,satellite carrier phase,VLBI,tri-frequency combination,and dual-frequency combination,were developed to determine the geopotential differences using optical atomic clocks and then determine the geopotential at station B based on the geopotential at station A.This review elaborates the principles,methods,scientific objectives,applications,and relevant research trends of geopotential determination based on time-frequency signals.展开更多
文摘In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and recognition accuracy on resource-constrained devices such as mobile terminals remains a major challenge.To address this,we propose a novel time-frequency dual-branch parallel residual network,which integrates a Dual-Branch Broadcast Residual module and a Time-Frequency Coordinate Attention module.The time-domain and frequency-domain branches are designed in parallel to independently extract temporal and spectral features,effectively avoiding the potential information loss caused by serial stacking,while enhancing information flow and multi-scale feature fusion.In terms of training strategy,a curriculum learning approach is introduced to progressively improve model robustness fromeasy to difficult tasks.Experimental results demonstrate that the proposed method consistently outperforms existing lightweight models under various signal-to-noise ratio(SNR)conditions,achieving superior far-field recognition performance on the Google Speech Commands V2 dataset.Notably,the model maintains stable performance even in low-SNR environments such as–10 dB,and generalizes well to unseen SNR conditions during training,validating its robustness to novel noise scenarios.Furthermore,the proposed model exhibits significantly fewer parameters,making it highly suitable for deployment on resource-limited devices.Overall,the model achieves a favorable balance between performance and parameter efficiency,demonstrating strong potential for practical applications.
文摘随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。
基金supported by the National Natural Science Foundation of China under Grant No.12204062the Natural Science Foundation of Shandong Province under Grant No.ZR2022MF330。
文摘To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.
基金supported by the Innovative Human Resource Development for Local Intel-lectualization program through the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.IITP-2026-2020-0-01741)the research fund of Hanyang University(HY-2025-1110).
文摘Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Metaverse Support Program to Nurture the Best Talents(IITP-2024-RS-2023-00254529)grant funded by the Korea government(MSIT).
文摘Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.
文摘Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contemporary enterprises typically operate 200+interconnected systems,with research indicating that 52% of organizations manage three or more enterprise content management systems,creating information silos that reduce operational efficiency by up to 35%.While attention mechanisms have demonstrated remarkable success in natural language processing and computer vision,their systematic application to business information systems remains largely unexplored.This paper presents the theoretical foundation for a Hierarchical Attention-Based Business Information System(HABIS)framework that applies multi-level attention mechanisms to enterprise environments.We provide a comprehensive mathematical formulation of the framework,analyze its computational complexity,and present a proof-of-concept implementation with simulation-based validation that demonstrates a 42% reduction in crosssystem query latency compared to legacy ERP modules and 70% improvement in prediction accuracy over baseline methods.The theoretical framework introduces four hierarchical attention levels:system-level attention for dynamic weighting of business systems,process-level attention for business process prioritization,data-level attention for critical information selection,and temporal attention for time-sensitive pattern recognition.Our complexity analysis demonstrates that the framework achieves O(n log n)computational complexity for attention computation,making it scalable to large enterprise environments including retail supply chains with 200+system-scale deployments.The proof-of-concept implementation validates the theoretical framework’s feasibility withMSE loss of 0.439 and response times of 0.000120 s per query,demonstrating its potential for addressing key challenges in business information systems.This work establishes a foundation for future empirical research and practical implementation of attention-driven enterprise systems.
基金supported by the National Natural Science Foundation of China(Grant Nos.52379098,52274075)the Project of Xingliao Talents Program(XLYC2203008)the Science and Technology Program Project of Liaoning Province(2025JH2/101900011).
文摘Understanding how rock slopes respond to blasting loads is crucial for maintaining excavation safety and slope stability.Nevertheless,the spatiotemporal evolution,nonlinear dependence on blasting parameters,and predictive behavior of dominant frequency responses in slope vibrations remain insufficiently understood and quantified.This study combines time-frequency analysis with machine learning to explore how the dominant frequency(f_(d))evolves in slopes under blasting.Continuous Wavelet Transform(CWT)was employed to characterize the temporal-frequency evolution of vibration signals,revealing that the dominant frequency exhibits strong spatial dependence and nonlinear variability influenced by blasting parameters and rock mass structures.Three machine learning models,namely Back Propagation Neural Network(BP),Support Vector Machine(SVM),and Random Forest(RF),were developed to predict f_(d) based on 1,000 monitoring samples obtained from numerical and field simulations.Among them,the RF model achieved the highest prediction accuracy,with mean absolute percentage errors(MAPE)below 15%,demonstrating strong robustness and generalization capability.Our analysis shows that external excitation factors,especially the loading frequency(f_(d)),mainly control the frequency response,while internal controlling factors,such as spatial position,lithological variation,and mechanical heterogeneity,modulate localized frequency amplification and energy redistribution.The results reveal that f_(d) tends to decrease with elevation and distance from the blasting source,whereas structural planes and weathered zones induce high-frequency amplification due to scattering and modal coupling effects.This study offers a new framework combining time-frequency analysis and machine learning to measure the nonlinear interaction between blasting and rock mass response,offering new insights for dynamic stability evaluation and hazard mitigation in complex rock slope systems.
基金National Natural Science Foundation of China(Grant Nos.42388102,42030105,42192535)the Open Fund of State Key Laboratory of Precision Geodesy,Innovation Academy for Precision Measurement Science and Technology,Chinese Academy of Sciences(Grant No.SKLPG2025-1-5)。
文摘The state-of-the-art optical atomic clocks and the time-frequency signal transmission open a fresh field for gravity potential(geopotential)determination.Various methods,including optical fiber frequency transfer,satellite two-way,satellite common-view,satellite carrier phase,VLBI,tri-frequency combination,and dual-frequency combination,were developed to determine the geopotential differences using optical atomic clocks and then determine the geopotential at station B based on the geopotential at station A.This review elaborates the principles,methods,scientific objectives,applications,and relevant research trends of geopotential determination based on time-frequency signals.