To overcome the problem of imprecise and unclear information in the development of quality functions,a method for determining the priority of engineering features based on mixed linguistic variables is proposed.First,...To overcome the problem of imprecise and unclear information in the development of quality functions,a method for determining the priority of engineering features based on mixed linguistic variables is proposed.First,the evaluation member uses the determined linguistic variable to give the correlation strength evaluation matrix of customer requirements and engineering features.Secondly,the relative importance of the evaluation member and customer requirements are aggregated.Finally,the priority of engineering features is obtained by calculating the deviation.The feasibility and practicability of this method are proven by taking the design of a new product of a long bag low-pressure pulse dust collector as an example.展开更多
Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three ...Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers:Basic,Conversion&Stability(efficiency and volatility across actions),and Advanced Interactions&Activity(crossbehavior synergies and intensity).Using real Taobao(Alibaba’s primary e-commerce platform)logs(57,976 records for 10,203 users;25 November–03 December 2017),we conducted a hierarchical,layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution.Across logistic regression(LR),decision tree,random forest,XGBoost,and CatBoost models with stratified 5-fold cross-validation,the performance improvedmonotonically fromBasic to Conversion&Stability to Advanced features.With LR,F1 increased from 0.613(Basic)to 0.962(Advanced);boosted models achieved high discrimination(0.995 AUC Score)and an F1 score up to 0.983.Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short(9-day)window.By making feature contributions measurable and reproducible,the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioralmodeling.The code and processed artifacts are publicly available,and future work will extend the validation to longer,seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design.展开更多
The geological features of three types of tropical volcanic rock and soil distributed along Jakarta-Bandung high-speed railway(HSR),including pozzolanic clayey soil,mud shale and deep soft soil,are studied through fie...The geological features of three types of tropical volcanic rock and soil distributed along Jakarta-Bandung high-speed railway(HSR),including pozzolanic clayey soil,mud shale and deep soft soil,are studied through field and laboratory tests.The paper analyzes the mechanism and causes of engineering geological problems caused by tropical volcanic rock and soil and puts forward measures to control subgrade slope instability by rationally determining project type,making side slope stability control and strengthening waterproofing and drainage.The“zero front slope”tunneling technology at the portal,the simplified excavation method of double-side wall heading and the cross brace construction method of arch protection within the semi-open cut row pile frame in the“mountainside”eccentrically loaded soft soil stratum are adopted to control the instability of tunnel side and front slopes,foundation pits and working faces;CFG or pipe piles shall be used to reinforce soft and expansive foundation or replacement measures shall be taken,and the scheme of blind ditch+double-layer water sealing in ballastless track section shall be put forward to prevent arching deformation of foundation;the treatment measures of CFG pile,pipe pile and vacuum combined piled preloading are adopted to improve the bearing capacity of foundation in deep soft soil section and solve the problems of settlement control and uneven settlement.These engineering countermeasures have been applied during the construction of Jakarta-Bandung HSR and achieved good results.展开更多
Employee turnover presents considerable challenges for organizations,leading to increased recruitment costs and disruptions in ongoing operations.High voluntary attrition rates can result in substantial financial loss...Employee turnover presents considerable challenges for organizations,leading to increased recruitment costs and disruptions in ongoing operations.High voluntary attrition rates can result in substantial financial losses,making it essential for Human Resource(HR)departments to prioritize turnover reduction.In this context,Artificial Intelligence(AI)has emerged as a vital tool in strengthening business strategies and people management.This paper incorporates two new representative features,introducing three types of feature engineering to enhance the analysis of employee turnover in the IBM HR Analytics dataset.Key Machine Learning(ML)techniques were subsequently employed in this work,such as Support Vector Machine(SVM),Random Forest(RF),Logistic Regression(LR),Extreme Gradient Boosting(XGBoost),and especially Categorical Boosting(CatBoost),a gradient boosting algorithm optimized for categorical data to analyze employee turnover.Adopting the unique feature engineering process enables CatBoost to enhance model accuracy and robustness while effectively analyzing complex patterns within employee data.Experimental results demonstrate the effectiveness of our proposed methodology,achieving the highest accuracy of 90.14%and an F1-score of 0.88 on the IBM dataset.To assess the capability of our detection system,we have also used an extended dataset,achieving an optimal accuracy of 98.10%and an F1-score of 0.98.These results strongly indicate the efficiency of our proposed methodology and highlight the impact of feature engineering on predictive performance.Moreover,by pinpointing the top ten factors influencing attrition,including“Monthly Income”,“Over Time”,“Total Satisfaction”,and others,this research equips HR departments with insights to implement targeted retention strategies,such as enhancing compensation or job satisfaction,to retain key talent before they consider leaving.展开更多
To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and ex...To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis.展开更多
The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present stu...The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present study focuses on resolving one of the major deficiencies of conventional neural networks(NNs)in dealing with rock engineering data.Herein,since the samples are obtained from hundreds of meters below the surface with the utmost difficulty,the number of samples is always limited.Meanwhile,the experimental analysis of these samples may result in many repetitive values and 0 s.However,conventional neural networks are incapable of making robust models in the presence of such data.On the other hand,these networks strongly depend on the initial weights and bias values for making reliable predictions.With this in mind,the current research introduces a novel kind of neural network processing framework for the geological that does not suffer from the limitations of the conventional NNs.The introduced single-data-based feature engineering network extracts all the information wrapped in every single data point without being affected by the other points.This method,being completely different from the conventional NNs,re-arranges all the basic elements of the neuron model into a new structure.Therefore,its mathematical calculations were performed from the very beginning.Moreover,the corresponding programming codes were developed in MATLAB and Python since they could not be found in any common programming software at the time being.This new kind of network was first evaluated through computer-based simulations of rock cracks in the 3 DEC environment.After the model’s reliability was confirmed,it was adopted in two case studies for estimating respectively tensile strength and shear strength of real rock samples.These samples were coal core samples from the Southern Qinshui Basin of China,and gas hydrate-bearing sediment(GHBS)samples from the Nankai Trough of Japan.The coal samples used in the experiments underwent nuclear magnetic resonance(NMR)measurements,and Scanning Electron Microscopy(SEM)imaging to investigate their original micro and macro fractures.Once done with these experiments,measurement of the rock mechanical properties,including tensile strength,was performed using a rock mechanical test system.However,the shear strength of GHBS samples was acquired through triaxial and direct shear tests.According to the obtained result,the new network structure outperformed the conventional neural networks in both cases of simulation-based and case study estimations of the tensile and shear strength.Even though the proposed approach of the current study originally aimed at resolving the issue of having a limited dataset,its unique properties would also be applied to larger datasets from other subsurface measurements.展开更多
A new method of extraction of blend surface feature is presented. It contains two steps: segmentation and recovery of parametric representation of the blend. The segmentation separates the points in the blend region f...A new method of extraction of blend surface feature is presented. It contains two steps: segmentation and recovery of parametric representation of the blend. The segmentation separates the points in the blend region from the rest of the input point cloud with the processes of sampling point data, estimation of local surface curvature properties and comparison of maximum curvature values. The recovery of parametric representation generates a set of profile curves by marching throughout the blend and fitting cylinders. Compared with the existing approaches of blend surface feature extraction, the proposed method reduces the requirement of user interaction and is capable of extracting blend surface with either constant radius or variable radius. Application examples are presented to verify the proposed method.展开更多
State of health(SOH)estimation of e-mobilities operated in real and dynamic conditions is essential and challenging.Most of existing estimations are based on a fixed constant current charging and discharging aging pro...State of health(SOH)estimation of e-mobilities operated in real and dynamic conditions is essential and challenging.Most of existing estimations are based on a fixed constant current charging and discharging aging profiles,which overlooked the fact that the charging and discharging profiles are random and not complete in real application.This work investigates the influence of feature engineering on the accuracy of different machine learning(ML)-based SOH estimations acting on different recharging sub-profiles where a realistic battery mission profile is considered.Fifteen features were extracted from the battery partial recharging profiles,considering different factors such as starting voltage values,charge amount,and charging sliding windows.Then,features were selected based on a feature selection pipeline consisting of filtering and supervised ML-based subset selection.Multiple linear regression(MLR),Gaussian process regression(GPR),and support vector regression(SVR)were applied to estimate SOH,and root mean square error(RMSE)was used to evaluate and compare the estimation performance.The results showed that the feature selection pipeline can improve SOH estimation accuracy by 55.05%,2.57%,and 2.82%for MLR,GPR and SVR respectively.It was demonstrated that the estimation based on partial charging profiles with lower starting voltage,large charge,and large sliding window size is more likely to achieve higher accuracy.This work hopes to give some insights into the supervised ML-based feature engineering acting on random partial recharges on SOH estimation performance and tries to fill the gap of effective SOH estimation between theoretical study and real dynamic application.展开更多
Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are devel...Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are developed around different datasets for diabetic prediction.It is essential for ML models to predict diabetes accurately.Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes.Feature engineering(FE)is the way of taking forward in yielding highly informative features.Pima Indian Diabetes Dataset(PIDD)is used in this work,and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes.Missing values(MV)and the effect of the imputation process in the data distribution of each feature are analyzed.Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose(GLUC),Body Mass Index(BMI),and Insulin(INS)are highly informative features.Derived features are obtained for BMI and INS to add more information with its raw form.The ensemble classifier with an ensemble of AdaBoost(AB)and XGBoost(XB)is considered for the impact analysis of the proposed FE approach.The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio(DOR)of 117.694.This shows a high margin of 8.2%when compared with the ensemble model with no derived features(DOR=96.306)included in the experiment.The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity(0.793),Specificity(0.945),DOR(79.517),and False Omission Rate(0.090)which further improves the state-of-the-art results.展开更多
Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition sys...Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition systems and medical imaging.These applications require high spatial and perceptual quality of synthesised meshes.Despite their significance,these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.Methods We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes.This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with L_(1) and L_(2) norm metrics and underperforms on perceptual metrics.In contrast,using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error.The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.Results The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.展开更多
With the emergence of massive online courses,how to evaluate the quality of courses with different qualities to improve the discrimination between courses and recommend personalized online course learning resources fo...With the emergence of massive online courses,how to evaluate the quality of courses with different qualities to improve the discrimination between courses and recommend personalized online course learning resources for learners needs to be evaluated from all aspects.In this paper,a method of constructing an online course portrait based on feature engineering is proposed.Firstly,the framework of online course portrait is established,the related features of the portrait are extracted by feature engineering method,and then the indicator weights of the portrait are calculated by entropy weight method.Finally,experiments are designed to evaluate the performance of the algorithms,and an example of the course portrait is given.展开更多
Machine learning is employed to comprehensively analyze and predict the hardenability of 20CrMo steel.The hardenability dataset includes J9 and J15 hardenability values,chemical composition,and heat treatment paramete...Machine learning is employed to comprehensively analyze and predict the hardenability of 20CrMo steel.The hardenability dataset includes J9 and J15 hardenability values,chemical composition,and heat treatment parameters.Various machine learning models,including linear regression(LR),k-nearest neighbors(KNN),random forest(RF),and extreme Gradient Boosting(XGBoost),are employed to develop predictive models for the hardenability of 20CrMo steel.Among these models,the XGBoost model achieves the best performance,with coefficients of determination(R2)of 0.941 and 0.946 for predicting J9 and J15 values,respectively.The predictions fall with a±2 HRC bandwidth for 98%of J9 cases and 99%of J15 cases.Additionally,SHapley Additive exPlanations(SHAP)analysis is used to identify the key elements that significantly influence the hardenability of the 20CrMo steel.The analysis revealed that alloying elements such as Si,Cr,C,N and Mo play significant roles in hardenability.The strengths and weaknesses of various machine learning models in predicting hardenability are also discussed.展开更多
Superconducting radio-frequency(SRF)cavities are the core components of SRF linear accelerators,making their stable operation considerably important.However,the operational experience from different accelerator labora...Superconducting radio-frequency(SRF)cavities are the core components of SRF linear accelerators,making their stable operation considerably important.However,the operational experience from different accelerator laboratories has revealed that SRF faults are the leading cause of short machine downtime trips.When a cavity fault occurs,system experts analyze the time-series data recorded by low-level RF systems and identify the fault type.However,this requires expertise and intuition,posing a major challenge for control-room operators.Here,we propose an expert feature-based machine learning model for automating SRF cavity fault recognition.The main challenge in converting the"expert reasoning"process for SRF faults into a"model inference"process lies in feature extraction,which is attributed to the associated multidimensional and complex time-series waveforms.Existing autoregression-based feature-extraction methods require the signal to be stable and autocorrelated,resulting in difficulty in capturing the abrupt features that exist in several SRF failure patterns.To address these issues,we introduce expertise into the classification model through reasonable feature engineering.We demonstrate the feasibility of this method using the SRF cavity of the China accelerator facility for superheavy elements(CAFE2).Although specific faults in SRF cavities may vary across different accelerators,similarities exist in the RF signals.Therefore,this study provides valuable guidance for fault analysis of the entire SRF community.展开更多
Although lithium-ion batteries(LIBs)currently dominate a wide spectrum of energy storage applications,they face challenges such as fast cycle life decay and poor stability that hinder their further application.To addr...Although lithium-ion batteries(LIBs)currently dominate a wide spectrum of energy storage applications,they face challenges such as fast cycle life decay and poor stability that hinder their further application.To address these limitations,element doping has emerged as a prevalent strategy to enhance the discharge capacity and extend the durability of Li-Ni-Co-Mn(LNCM)ternary compounds.This study utilized a machine learning-driven feature screening method to effectively pinpoint four key features crucially impacting the initial discharge capacity(IC)of Li-Ni-Co-Mn(LNCM)ternary cathode materials.These features were also proved highly predictive for the 50^(th)cycle discharge capacity(EC).Additionally,the application of SHAP value analysis yielded an in-depth understanding of the interplay between these features and discharge performance.This insight offers valuable direction for future advancements in the development of LNCM cathode materials,effectively promoting this field toward greater efficiency and sustainability.展开更多
Phase classification has a clear guiding significance for the design of high entropy alloys.For mutually exclusive and non-mutually exclusive classifications,the composition descriptors,commonly used physical paramete...Phase classification has a clear guiding significance for the design of high entropy alloys.For mutually exclusive and non-mutually exclusive classifications,the composition descriptors,commonly used physical parameter descriptors,elemental-property descriptors,and descriptors extracted from the periodic table representation(PTR)by the convolutional neural network were collected.Appropriate selection among features with rich information is helpful for phase classification.Based on random forest,the accuracy of the four-label classification and balanced accuracy of the five-label classification were improved to be 0.907 and 0.876,respectively.The roles of the four important features were summarized by interpretability analysis,and a new important feature was found.The model extrapolation ability and the influence of Mo were demonstrated by phase prediction in(CoFeNiMn)_(1-x)Mo_(x).The phase information is helpful for the hardness prediction,the classification results were coupled with the PTR of hardness data,and the prediction error(the root mean square error)was reduced to 56.69.展开更多
Malware continues to pose a significant threat to cybersecurity,with new advanced infections that go beyond traditional detection.Limitations in existing systems include high false-positive rates,slow system response ...Malware continues to pose a significant threat to cybersecurity,with new advanced infections that go beyond traditional detection.Limitations in existing systems include high false-positive rates,slow system response times,and inability to respond quickly to new malware forms.To overcome these challenges,this paper proposes OMD-RAS:Implementing Malware Detection in an Optimized Way through Real-Time and Adaptive Security as an extensive approach,hoping to get good results towards better malware threat detection and remediation.The significant steps in the model are data collection followed by comprehensive preprocessing consisting of feature engineering and normalization.Static analysis,along with dynamic analysis,is done to capture the whole spectrum of malware behavior for the feature extraction process.The extracted processed features are given with a continuous learning mechanism to the Extreme Learning Machine model of real-time detection.This OMD-RAS trains quickly and has great accuracy,providing elite,advanced real-time detection capabilities.This approach uses continuous learning to adapt to new threats—ensuring the effectiveness of detection even as strategies used by malware may change over time.The experimental results showed that OMD-RAS performs better than the traditional approaches.For instance,the OMD-RAS model has been able to achieve an accuracy of 96.23%and massively reduce the rate of false positives across all datasets while eliciting a consistently high rate of precision and recall.The model’s adaptive learning reflected enhancements on other performance measures-for example,Matthews Correlation Coefficients and Log Loss.展开更多
Accurate short-term forecast of offshore wind fields is still challenging for numerical weather prediction models.Based on three years of 48-hour forecast data from the European Centre for Medium-Range Weather Forecas...Accurate short-term forecast of offshore wind fields is still challenging for numerical weather prediction models.Based on three years of 48-hour forecast data from the European Centre for Medium-Range Weather Forecasts Integrated Forecasting System global model(ECMWF-IFS)over 14 offshore weather stations along the coast of Shandong Province,this study introduces a multi-task learning(MTL)model(TabNet-MTL),which significantly improves the forecast bias of near-surface wind direction and speed simultaneously.TabNet-MTL adopts the feature engineering method,utilizes mean square error as the loss function,and employs the 5-fold cross validation method to ensure the generalization ability of the trained model.It demonstrates superior skills in wind field correction across different forecast lead times over all stations compared to its single-task version(TabNet-STL)and three other popular single-task learning models(Random Forest,LightGBM,and XGBoost).Results show that it significantly reduces root mean square error of the ECMWF-IFS wind speed forecast from 2.20 to 1.25 m s−1,and increases the forecast accuracy of wind direction from 50%to 65%.As an explainable deep learning model,the weather stations and long-term temporal statistics of near-surface wind speed are identified as the most influential variables for TabNet-MTL in constructing its feature engineering.展开更多
With the rapid advancement of mobile communication networks,key technologies such as Multi-access Edge Computing(MEC)and Network Function Virtualization(NFV)have enhanced the quality of service for 5G users but have a...With the rapid advancement of mobile communication networks,key technologies such as Multi-access Edge Computing(MEC)and Network Function Virtualization(NFV)have enhanced the quality of service for 5G users but have also significantly increased the complexity of network threats.Traditional static defense mechanisms are inadequate for addressing the dynamic and heterogeneous nature of modern attack vectors.To overcome these challenges,this paper presents a novel algorithmic framework,SD-5G,designed for high-precision intrusion detection in 5G environments.SD-5G adopts a three-stage architecture comprising traffic feature extraction,elastic representation,and adaptive classification.Specifically,an enhanced Concrete Autoencoder(CAE)is employed to reconstruct and compress high-dimensional network traffic features,producing compact and expressive representations suitable for large-scale 5G deployments.To further improve accuracy in ambiguous traffic classification,a Residual Convolutional Long Short-Term Memory model with an attention mechanism(ResCLA)is introduced,enabling multi-level modeling of spatial–temporal dependencies and effective detection of subtle anomalies.Extensive experiments on benchmark datasets—including 5G-NIDD,CIC-IDS2017,ToN-IoT,and BoT-IoT—demonstrate that SD-5G consistently achieves F1 scores exceeding 99.19%across diverse network environments,indicating strong generalization and real-time deployment capabilities.Overall,SD-5G achieves a balance between detection accuracy and deployment efficiency,offering a scalable,flexible,and effective solution for intrusion detection in 5G and next-generation networks.展开更多
A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in se...A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in seven tertiary and secondary hospitals in Shanghai between January 1,2016 and January 1,2018.Among them,72 patients deceased during the two-year follow-up.Models are established with ensemble learning framework and machine learning algorithms based on 51 physiological indicators of the patient.Shapley additive explanations algorithm and univariate test with point-biserial and phi correlation coefficients are employed to determine significant features and rank feature importance.Based on 5-fold cross validation experiment and external validation,prediction model with self-paced ensemble framework and random forest algorithm achieves the best performance with area under receiver operating characteristic curve(AUROC)score of 0.911 and recall of 0.864.Both feature ranking methods showed that ejection fractions,serum creatinine(admission),hemoglobin and Killip class are the most important features.With these top-ranked features,the simplified prediction model is capable of achieving a comparable result with AUROC score of 0.872 and recall of 0.818.This work proposes a new method to establish mortality prediction models for AMI patients based on self-paced ensemble framework,which allows models to achieve high performance with small scale of patients coherent with low death rate.It will assist in medical decision and prognosis as a new reference.展开更多
Shield attitudes,essentially governed by intricate mechanisms,impact the segment assembly quality and tunnel axis deviation.In data-driven prediction,however,existing methods using the original driving parameters fail...Shield attitudes,essentially governed by intricate mechanisms,impact the segment assembly quality and tunnel axis deviation.In data-driven prediction,however,existing methods using the original driving parameters fail to present convincing performance due to insufficient consideration of complicated interactions among the parameters.Therefore,a multi-dimensional feature synthesizing and screening method is proposed to explore the optimal features that can better reflect the physical mechanism in predicting shield tunneling attitudes.Features embedded with physical knowledge were synthesized from seven dimensions,which were validated by the clustering quality of Shapley Additive Explanations(SHAP)values.Subsequently,a novel index,Expected Impact Index(EII),has been proposed for screening the optimal features reliably.Finally,a Bayesian-optimized deep learning model was established to validate the proposed method in a case study.Results show that the proposed method effectively identifies the optimal parameters for shield attitude prediction,with an average Mean Squared Error(MSE)deduction of 27.3%.The proposed method realized effective assimilation of shield driving data with physical mechanism,providing a valuable reference for shield deviation control.展开更多
文摘To overcome the problem of imprecise and unclear information in the development of quality functions,a method for determining the priority of engineering features based on mixed linguistic variables is proposed.First,the evaluation member uses the determined linguistic variable to give the correlation strength evaluation matrix of customer requirements and engineering features.Secondly,the relative importance of the evaluation member and customer requirements are aggregated.Finally,the priority of engineering features is obtained by calculating the deviation.The feasibility and practicability of this method are proven by taking the design of a new product of a long bag low-pressure pulse dust collector as an example.
基金supported by the research fund of Hanyang University(HY-202500000001616).
文摘Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers:Basic,Conversion&Stability(efficiency and volatility across actions),and Advanced Interactions&Activity(crossbehavior synergies and intensity).Using real Taobao(Alibaba’s primary e-commerce platform)logs(57,976 records for 10,203 users;25 November–03 December 2017),we conducted a hierarchical,layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution.Across logistic regression(LR),decision tree,random forest,XGBoost,and CatBoost models with stratified 5-fold cross-validation,the performance improvedmonotonically fromBasic to Conversion&Stability to Advanced features.With LR,F1 increased from 0.613(Basic)to 0.962(Advanced);boosted models achieved high discrimination(0.995 AUC Score)and an F1 score up to 0.983.Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short(9-day)window.By making feature contributions measurable and reproducible,the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioralmodeling.The code and processed artifacts are publicly available,and future work will extend the validation to longer,seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design.
文摘The geological features of three types of tropical volcanic rock and soil distributed along Jakarta-Bandung high-speed railway(HSR),including pozzolanic clayey soil,mud shale and deep soft soil,are studied through field and laboratory tests.The paper analyzes the mechanism and causes of engineering geological problems caused by tropical volcanic rock and soil and puts forward measures to control subgrade slope instability by rationally determining project type,making side slope stability control and strengthening waterproofing and drainage.The“zero front slope”tunneling technology at the portal,the simplified excavation method of double-side wall heading and the cross brace construction method of arch protection within the semi-open cut row pile frame in the“mountainside”eccentrically loaded soft soil stratum are adopted to control the instability of tunnel side and front slopes,foundation pits and working faces;CFG or pipe piles shall be used to reinforce soft and expansive foundation or replacement measures shall be taken,and the scheme of blind ditch+double-layer water sealing in ballastless track section shall be put forward to prevent arching deformation of foundation;the treatment measures of CFG pile,pipe pile and vacuum combined piled preloading are adopted to improve the bearing capacity of foundation in deep soft soil section and solve the problems of settlement control and uneven settlement.These engineering countermeasures have been applied during the construction of Jakarta-Bandung HSR and achieved good results.
基金supported by Innovative Human Resource Development for Local Intellectualization program through the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(IITP-2024-00156287,50%)supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Artificial Intelligence Convergence Innovation Human Resources Development(IITP-2023-RS-2023-00256629,25%)grant funded by the Korea government(MSIT)supported by the Korea Internet&Security Agency(KISA)-Information Security College Support Project(25%).
文摘Employee turnover presents considerable challenges for organizations,leading to increased recruitment costs and disruptions in ongoing operations.High voluntary attrition rates can result in substantial financial losses,making it essential for Human Resource(HR)departments to prioritize turnover reduction.In this context,Artificial Intelligence(AI)has emerged as a vital tool in strengthening business strategies and people management.This paper incorporates two new representative features,introducing three types of feature engineering to enhance the analysis of employee turnover in the IBM HR Analytics dataset.Key Machine Learning(ML)techniques were subsequently employed in this work,such as Support Vector Machine(SVM),Random Forest(RF),Logistic Regression(LR),Extreme Gradient Boosting(XGBoost),and especially Categorical Boosting(CatBoost),a gradient boosting algorithm optimized for categorical data to analyze employee turnover.Adopting the unique feature engineering process enables CatBoost to enhance model accuracy and robustness while effectively analyzing complex patterns within employee data.Experimental results demonstrate the effectiveness of our proposed methodology,achieving the highest accuracy of 90.14%and an F1-score of 0.88 on the IBM dataset.To assess the capability of our detection system,we have also used an extended dataset,achieving an optimal accuracy of 98.10%and an F1-score of 0.98.These results strongly indicate the efficiency of our proposed methodology and highlight the impact of feature engineering on predictive performance.Moreover,by pinpointing the top ten factors influencing attrition,including“Monthly Income”,“Over Time”,“Total Satisfaction”,and others,this research equips HR departments with insights to implement targeted retention strategies,such as enhancing compensation or job satisfaction,to retain key talent before they consider leaving.
基金supported in part by the National Natural Science Foundation of China under Grants 52475102 and 52205101in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2023A1515240021+1 种基金in part by the Young Talent Support Project of Guangzhou Association for Science and Technology(QT-2024-28)in part by the Youth Development Initiative of Guangdong Association for Science and Technology(SKXRC2025254).
文摘To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis.
文摘The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present study focuses on resolving one of the major deficiencies of conventional neural networks(NNs)in dealing with rock engineering data.Herein,since the samples are obtained from hundreds of meters below the surface with the utmost difficulty,the number of samples is always limited.Meanwhile,the experimental analysis of these samples may result in many repetitive values and 0 s.However,conventional neural networks are incapable of making robust models in the presence of such data.On the other hand,these networks strongly depend on the initial weights and bias values for making reliable predictions.With this in mind,the current research introduces a novel kind of neural network processing framework for the geological that does not suffer from the limitations of the conventional NNs.The introduced single-data-based feature engineering network extracts all the information wrapped in every single data point without being affected by the other points.This method,being completely different from the conventional NNs,re-arranges all the basic elements of the neuron model into a new structure.Therefore,its mathematical calculations were performed from the very beginning.Moreover,the corresponding programming codes were developed in MATLAB and Python since they could not be found in any common programming software at the time being.This new kind of network was first evaluated through computer-based simulations of rock cracks in the 3 DEC environment.After the model’s reliability was confirmed,it was adopted in two case studies for estimating respectively tensile strength and shear strength of real rock samples.These samples were coal core samples from the Southern Qinshui Basin of China,and gas hydrate-bearing sediment(GHBS)samples from the Nankai Trough of Japan.The coal samples used in the experiments underwent nuclear magnetic resonance(NMR)measurements,and Scanning Electron Microscopy(SEM)imaging to investigate their original micro and macro fractures.Once done with these experiments,measurement of the rock mechanical properties,including tensile strength,was performed using a rock mechanical test system.However,the shear strength of GHBS samples was acquired through triaxial and direct shear tests.According to the obtained result,the new network structure outperformed the conventional neural networks in both cases of simulation-based and case study estimations of the tensile and shear strength.Even though the proposed approach of the current study originally aimed at resolving the issue of having a limited dataset,its unique properties would also be applied to larger datasets from other subsurface measurements.
基金This project is supported by General Electric Corporate ResearchDevelopment and National Advanced Technology Project of China (No.863-511-942-018).
文摘A new method of extraction of blend surface feature is presented. It contains two steps: segmentation and recovery of parametric representation of the blend. The segmentation separates the points in the blend region from the rest of the input point cloud with the processes of sampling point data, estimation of local surface curvature properties and comparison of maximum curvature values. The recovery of parametric representation generates a set of profile curves by marching throughout the blend and fitting cylinders. Compared with the existing approaches of blend surface feature extraction, the proposed method reduces the requirement of user interaction and is capable of extracting blend surface with either constant radius or variable radius. Application examples are presented to verify the proposed method.
基金funded by China Scholarship Council.The fund number is 202108320111 and 202208320055。
文摘State of health(SOH)estimation of e-mobilities operated in real and dynamic conditions is essential and challenging.Most of existing estimations are based on a fixed constant current charging and discharging aging profiles,which overlooked the fact that the charging and discharging profiles are random and not complete in real application.This work investigates the influence of feature engineering on the accuracy of different machine learning(ML)-based SOH estimations acting on different recharging sub-profiles where a realistic battery mission profile is considered.Fifteen features were extracted from the battery partial recharging profiles,considering different factors such as starting voltage values,charge amount,and charging sliding windows.Then,features were selected based on a feature selection pipeline consisting of filtering and supervised ML-based subset selection.Multiple linear regression(MLR),Gaussian process regression(GPR),and support vector regression(SVR)were applied to estimate SOH,and root mean square error(RMSE)was used to evaluate and compare the estimation performance.The results showed that the feature selection pipeline can improve SOH estimation accuracy by 55.05%,2.57%,and 2.82%for MLR,GPR and SVR respectively.It was demonstrated that the estimation based on partial charging profiles with lower starting voltage,large charge,and large sliding window size is more likely to achieve higher accuracy.This work hopes to give some insights into the supervised ML-based feature engineering acting on random partial recharges on SOH estimation performance and tries to fill the gap of effective SOH estimation between theoretical study and real dynamic application.
文摘Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are developed around different datasets for diabetic prediction.It is essential for ML models to predict diabetes accurately.Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes.Feature engineering(FE)is the way of taking forward in yielding highly informative features.Pima Indian Diabetes Dataset(PIDD)is used in this work,and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes.Missing values(MV)and the effect of the imputation process in the data distribution of each feature are analyzed.Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose(GLUC),Body Mass Index(BMI),and Insulin(INS)are highly informative features.Derived features are obtained for BMI and INS to add more information with its raw form.The ensemble classifier with an ensemble of AdaBoost(AB)and XGBoost(XB)is considered for the impact analysis of the proposed FE approach.The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio(DOR)of 117.694.This shows a high margin of 8.2%when compared with the ensemble model with no derived features(DOR=96.306)included in the experiment.The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity(0.793),Specificity(0.945),DOR(79.517),and False Omission Rate(0.090)which further improves the state-of-the-art results.
基金Supported by the Centre for Digital Entertainment at Bournemouth University by the UK Engineering and Physical Sciences Research Council(EPSRC)EP/L016540/1 and Humain Ltd.
文摘Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition systems and medical imaging.These applications require high spatial and perceptual quality of synthesised meshes.Despite their significance,these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.Methods We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes.This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with L_(1) and L_(2) norm metrics and underperforms on perceptual metrics.In contrast,using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error.The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.Results The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.
基金This work is supported by the National Key Research and Development Program of China(Grant No.2020AAA0108803).
文摘With the emergence of massive online courses,how to evaluate the quality of courses with different qualities to improve the discrimination between courses and recommend personalized online course learning resources for learners needs to be evaluated from all aspects.In this paper,a method of constructing an online course portrait based on feature engineering is proposed.Firstly,the framework of online course portrait is established,the related features of the portrait are extracted by feature engineering method,and then the indicator weights of the portrait are calculated by entropy weight method.Finally,experiments are designed to evaluate the performance of the algorithms,and an example of the course portrait is given.
基金supported by the Key scientific and technological project plan of Hebei Iron and Steel Group(No.HG2023235).
文摘Machine learning is employed to comprehensively analyze and predict the hardenability of 20CrMo steel.The hardenability dataset includes J9 and J15 hardenability values,chemical composition,and heat treatment parameters.Various machine learning models,including linear regression(LR),k-nearest neighbors(KNN),random forest(RF),and extreme Gradient Boosting(XGBoost),are employed to develop predictive models for the hardenability of 20CrMo steel.Among these models,the XGBoost model achieves the best performance,with coefficients of determination(R2)of 0.941 and 0.946 for predicting J9 and J15 values,respectively.The predictions fall with a±2 HRC bandwidth for 98%of J9 cases and 99%of J15 cases.Additionally,SHapley Additive exPlanations(SHAP)analysis is used to identify the key elements that significantly influence the hardenability of the 20CrMo steel.The analysis revealed that alloying elements such as Si,Cr,C,N and Mo play significant roles in hardenability.The strengths and weaknesses of various machine learning models in predicting hardenability are also discussed.
基金supported by the studies of intelligent LLRF control algorithms for superconducting RF cavities(No.E129851YR0)the National Natural Science Foundation of China(No.U22A20261)Applications of Artificial Intelligence in the Stability Study of Superconducting Linear Accelerators(No.E429851YR0)。
文摘Superconducting radio-frequency(SRF)cavities are the core components of SRF linear accelerators,making their stable operation considerably important.However,the operational experience from different accelerator laboratories has revealed that SRF faults are the leading cause of short machine downtime trips.When a cavity fault occurs,system experts analyze the time-series data recorded by low-level RF systems and identify the fault type.However,this requires expertise and intuition,posing a major challenge for control-room operators.Here,we propose an expert feature-based machine learning model for automating SRF cavity fault recognition.The main challenge in converting the"expert reasoning"process for SRF faults into a"model inference"process lies in feature extraction,which is attributed to the associated multidimensional and complex time-series waveforms.Existing autoregression-based feature-extraction methods require the signal to be stable and autocorrelated,resulting in difficulty in capturing the abrupt features that exist in several SRF failure patterns.To address these issues,we introduce expertise into the classification model through reasonable feature engineering.We demonstrate the feasibility of this method using the SRF cavity of the China accelerator facility for superheavy elements(CAFE2).Although specific faults in SRF cavities may vary across different accelerators,similarities exist in the RF signals.Therefore,this study provides valuable guidance for fault analysis of the entire SRF community.
基金supported by the National Natural Science Foundation of China(Nos.52122408,52071023)the Program for Science&Technology Innovation Talents in the University of Henan Province(No.22HASTIT1006)+2 种基金the Program for Central Plains Talents(No.ZYYCYU202012172)the Ministry of Education,Singapore(No.RG70/20)the Opening Project of National Joint Engineering Research Center for Abrasion Control and Molding of Metal Materials,Henan University of Science and Technology(No.HKDNM201906).
文摘Although lithium-ion batteries(LIBs)currently dominate a wide spectrum of energy storage applications,they face challenges such as fast cycle life decay and poor stability that hinder their further application.To address these limitations,element doping has emerged as a prevalent strategy to enhance the discharge capacity and extend the durability of Li-Ni-Co-Mn(LNCM)ternary compounds.This study utilized a machine learning-driven feature screening method to effectively pinpoint four key features crucially impacting the initial discharge capacity(IC)of Li-Ni-Co-Mn(LNCM)ternary cathode materials.These features were also proved highly predictive for the 50^(th)cycle discharge capacity(EC).Additionally,the application of SHAP value analysis yielded an in-depth understanding of the interplay between these features and discharge performance.This insight offers valuable direction for future advancements in the development of LNCM cathode materials,effectively promoting this field toward greater efficiency and sustainability.
基金supported by the National Natural Science Foundation of China(Nos.51671075,51971086)the Natural Science Foundation of Heilongjiang Province,China(No.LH2022E081)。
文摘Phase classification has a clear guiding significance for the design of high entropy alloys.For mutually exclusive and non-mutually exclusive classifications,the composition descriptors,commonly used physical parameter descriptors,elemental-property descriptors,and descriptors extracted from the periodic table representation(PTR)by the convolutional neural network were collected.Appropriate selection among features with rich information is helpful for phase classification.Based on random forest,the accuracy of the four-label classification and balanced accuracy of the five-label classification were improved to be 0.907 and 0.876,respectively.The roles of the four important features were summarized by interpretability analysis,and a new important feature was found.The model extrapolation ability and the influence of Mo were demonstrated by phase prediction in(CoFeNiMn)_(1-x)Mo_(x).The phase information is helpful for the hardness prediction,the classification results were coupled with the PTR of hardness data,and the prediction error(the root mean square error)was reduced to 56.69.
基金supported by a grant from the Center of Excellence in Information Assurance(CoEIA),King Saud University(KSU).
文摘Malware continues to pose a significant threat to cybersecurity,with new advanced infections that go beyond traditional detection.Limitations in existing systems include high false-positive rates,slow system response times,and inability to respond quickly to new malware forms.To overcome these challenges,this paper proposes OMD-RAS:Implementing Malware Detection in an Optimized Way through Real-Time and Adaptive Security as an extensive approach,hoping to get good results towards better malware threat detection and remediation.The significant steps in the model are data collection followed by comprehensive preprocessing consisting of feature engineering and normalization.Static analysis,along with dynamic analysis,is done to capture the whole spectrum of malware behavior for the feature extraction process.The extracted processed features are given with a continuous learning mechanism to the Extreme Learning Machine model of real-time detection.This OMD-RAS trains quickly and has great accuracy,providing elite,advanced real-time detection capabilities.This approach uses continuous learning to adapt to new threats—ensuring the effectiveness of detection even as strategies used by malware may change over time.The experimental results showed that OMD-RAS performs better than the traditional approaches.For instance,the OMD-RAS model has been able to achieve an accuracy of 96.23%and massively reduce the rate of false positives across all datasets while eliciting a consistently high rate of precision and recall.The model’s adaptive learning reflected enhancements on other performance measures-for example,Matthews Correlation Coefficients and Log Loss.
基金the National Key Research and Development Plan of China[Grant No.2023YFB3002400]the Shanghai 2021 Natural Science Foundation[Grant Nos.21ZR1420400 and 21ZR1419800]+1 种基金the Shanghai 2023 Natural Science Foundation[Grant No.23ZR1463000]the Shandong Provincial Meteorological Bureau Scientific Research Project[Grant No.2023SDBD05].
文摘Accurate short-term forecast of offshore wind fields is still challenging for numerical weather prediction models.Based on three years of 48-hour forecast data from the European Centre for Medium-Range Weather Forecasts Integrated Forecasting System global model(ECMWF-IFS)over 14 offshore weather stations along the coast of Shandong Province,this study introduces a multi-task learning(MTL)model(TabNet-MTL),which significantly improves the forecast bias of near-surface wind direction and speed simultaneously.TabNet-MTL adopts the feature engineering method,utilizes mean square error as the loss function,and employs the 5-fold cross validation method to ensure the generalization ability of the trained model.It demonstrates superior skills in wind field correction across different forecast lead times over all stations compared to its single-task version(TabNet-STL)and three other popular single-task learning models(Random Forest,LightGBM,and XGBoost).Results show that it significantly reduces root mean square error of the ECMWF-IFS wind speed forecast from 2.20 to 1.25 m s−1,and increases the forecast accuracy of wind direction from 50%to 65%.As an explainable deep learning model,the weather stations and long-term temporal statistics of near-surface wind speed are identified as the most influential variables for TabNet-MTL in constructing its feature engineering.
文摘With the rapid advancement of mobile communication networks,key technologies such as Multi-access Edge Computing(MEC)and Network Function Virtualization(NFV)have enhanced the quality of service for 5G users but have also significantly increased the complexity of network threats.Traditional static defense mechanisms are inadequate for addressing the dynamic and heterogeneous nature of modern attack vectors.To overcome these challenges,this paper presents a novel algorithmic framework,SD-5G,designed for high-precision intrusion detection in 5G environments.SD-5G adopts a three-stage architecture comprising traffic feature extraction,elastic representation,and adaptive classification.Specifically,an enhanced Concrete Autoencoder(CAE)is employed to reconstruct and compress high-dimensional network traffic features,producing compact and expressive representations suitable for large-scale 5G deployments.To further improve accuracy in ambiguous traffic classification,a Residual Convolutional Long Short-Term Memory model with an attention mechanism(ResCLA)is introduced,enabling multi-level modeling of spatial–temporal dependencies and effective detection of subtle anomalies.Extensive experiments on benchmark datasets—including 5G-NIDD,CIC-IDS2017,ToN-IoT,and BoT-IoT—demonstrate that SD-5G consistently achieves F1 scores exceeding 99.19%across diverse network environments,indicating strong generalization and real-time deployment capabilities.Overall,SD-5G achieves a balance between detection accuracy and deployment efficiency,offering a scalable,flexible,and effective solution for intrusion detection in 5G and next-generation networks.
基金the National Natural Science Foundation of China(No.81900308)。
文摘A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in seven tertiary and secondary hospitals in Shanghai between January 1,2016 and January 1,2018.Among them,72 patients deceased during the two-year follow-up.Models are established with ensemble learning framework and machine learning algorithms based on 51 physiological indicators of the patient.Shapley additive explanations algorithm and univariate test with point-biserial and phi correlation coefficients are employed to determine significant features and rank feature importance.Based on 5-fold cross validation experiment and external validation,prediction model with self-paced ensemble framework and random forest algorithm achieves the best performance with area under receiver operating characteristic curve(AUROC)score of 0.911 and recall of 0.864.Both feature ranking methods showed that ejection fractions,serum creatinine(admission),hemoglobin and Killip class are the most important features.With these top-ranked features,the simplified prediction model is capable of achieving a comparable result with AUROC score of 0.872 and recall of 0.818.This work proposes a new method to establish mortality prediction models for AMI patients based on self-paced ensemble framework,which allows models to achieve high performance with small scale of patients coherent with low death rate.It will assist in medical decision and prognosis as a new reference.
文摘Shield attitudes,essentially governed by intricate mechanisms,impact the segment assembly quality and tunnel axis deviation.In data-driven prediction,however,existing methods using the original driving parameters fail to present convincing performance due to insufficient consideration of complicated interactions among the parameters.Therefore,a multi-dimensional feature synthesizing and screening method is proposed to explore the optimal features that can better reflect the physical mechanism in predicting shield tunneling attitudes.Features embedded with physical knowledge were synthesized from seven dimensions,which were validated by the clustering quality of Shapley Additive Explanations(SHAP)values.Subsequently,a novel index,Expected Impact Index(EII),has been proposed for screening the optimal features reliably.Finally,a Bayesian-optimized deep learning model was established to validate the proposed method in a case study.Results show that the proposed method effectively identifies the optimal parameters for shield attitude prediction,with an average Mean Squared Error(MSE)deduction of 27.3%.The proposed method realized effective assimilation of shield driving data with physical mechanism,providing a valuable reference for shield deviation control.