Titanium-based semiconductors are known for their high chemical stability and suitable band gap widths.However,the conventional experimental screening methods are inefficient due to the wide variety of materials.To sp...Titanium-based semiconductors are known for their high chemical stability and suitable band gap widths.However,the conventional experimental screening methods are inefficient due to the wide variety of materials.To speed up the selection process,this work focuses on interpretable feature learning and band gap prediction for titanium-based semiconductors.First,titanium compounds were selected from the Materials Project database by machine learning,and elemental features were extracted using the Magpie descriptors.Then,principal component analysis(PCA)was applied to reduce the data dimensionality,creating a representative dataset.Meantime,heatmaps and SHAP(SHapley Additive exPlanations)methods were used to demonstrate the influence of key features such as electronegativity,covalent radius,period number,and unit cell volume on the bandgap,understanding the relationship between the material’s properties and performance.After comparing different machine learning models,including Random Forest(RF),Support Vector Machines(SVM),Linear Regression(LR),and Gradient Boosting Regression(GBR),the RF was found to be the most accurate for band gap prediction.Finally,the model performance was improved through parameter tuning,showing high accuracy.These findings provide strong data support and design guidance for the development of materials in fields like photocatalysis and solar cells.展开更多
文摘Titanium-based semiconductors are known for their high chemical stability and suitable band gap widths.However,the conventional experimental screening methods are inefficient due to the wide variety of materials.To speed up the selection process,this work focuses on interpretable feature learning and band gap prediction for titanium-based semiconductors.First,titanium compounds were selected from the Materials Project database by machine learning,and elemental features were extracted using the Magpie descriptors.Then,principal component analysis(PCA)was applied to reduce the data dimensionality,creating a representative dataset.Meantime,heatmaps and SHAP(SHapley Additive exPlanations)methods were used to demonstrate the influence of key features such as electronegativity,covalent radius,period number,and unit cell volume on the bandgap,understanding the relationship between the material’s properties and performance.After comparing different machine learning models,including Random Forest(RF),Support Vector Machines(SVM),Linear Regression(LR),and Gradient Boosting Regression(GBR),the RF was found to be the most accurate for band gap prediction.Finally,the model performance was improved through parameter tuning,showing high accuracy.These findings provide strong data support and design guidance for the development of materials in fields like photocatalysis and solar cells.