Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s...Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.展开更多
As pivotal supporting technologies for smart manufacturing and digital engineering,model-based and data-driven methods have been widely applied in many industrial fields,such as product design,process monitoring,and s...As pivotal supporting technologies for smart manufacturing and digital engineering,model-based and data-driven methods have been widely applied in many industrial fields,such as product design,process monitoring,and smart maintenance.While promising,both methods have issues that need to be addressed.For example,model-based methods are limited by low computational accuracy and a high computational burden,and data-driven methods always suffer from poor interpretability and redundant features.To address these issues,the concept of data-model fusion(DMF)emerges as a promising solution.DMF involves integrating model-based methods with data-driven methods by incorporating big data into model-based methods or embedding relevant domain knowledge into data-driven methods.Despite growing efforts in the field of DMF,a unanimous definition of DMF remains elusive,and a general framework of DMF has been rarely discussed.This paper aims to address this gap by providing a thorough overview and categorization of both data-driven methods and model-based methods.Subsequently,this paper also presents the definition and categorization of DMF and discusses the general framework of DMF.Moreover,the primary seven applications of DMF are reviewed within the context of smart manufacturing and digital engineering.Finally,this paper directs the future directions of DMF.展开更多
Data hiding methods involve embedding secret messages into cover objects to enable covert communication in a way that is difficult to detect.In data hiding methods based on image interpolation,the image size is reduce...Data hiding methods involve embedding secret messages into cover objects to enable covert communication in a way that is difficult to detect.In data hiding methods based on image interpolation,the image size is reduced and then enlarged through interpolation,followed by the embedding of secret data into the newly generated pixels.A general improving approach for embedding secret messages is proposed.The approach may be regarded a general model for enhancing the data embedding capacity of various existing image interpolation-based data hiding methods.This enhancement is achieved by expanding the range of pixel values available for embedding secret messages,removing the limitations of many existing methods,where the range is restricted to powers of two to facilitate the direct embedding of bit-based messages.This improvement is accomplished through the application of multiple-based number conversion to the secret message data.The method converts the message bits into a multiple-based number and uses an algorithm to embed each digit of this number into an individual pixel,thereby enhancing the message embedding efficiency,as proved by a theorem derived in this study.The proposed improvement method has been tested through experiments on three well-known image interpolation-based data hiding methods.The results show that the proposed method can enhance the three data embedding rates by approximately 14%,13%,and 10%,respectively,create stego-images with good quality,and resist RS steganalysis attacks.These experimental results indicate that the use of the multiple-based number conversion technique to improve the three interpolation-based methods for embedding secret messages increases the number of message bits embedded in the images.For many image interpolation-based data hiding methods,which use power-of-two pixel-value ranges for message embedding,other than the three tested ones,the proposed improvement method is also expected to be effective for enhancing their data embedding capabilities.展开更多
The uniaxial compressive strength(UCS)of rocks is a vital geomechanical parameter widely used for rock mass classification,stability analysis,and engineering design in rock engineering.Various UCS testing methods and ...The uniaxial compressive strength(UCS)of rocks is a vital geomechanical parameter widely used for rock mass classification,stability analysis,and engineering design in rock engineering.Various UCS testing methods and apparatuses have been proposed over the past few decades.The objective of the present study is to summarize the status and development in theories,test apparatuses,data processing of the existing testing methods for UCS measurement.It starts with elaborating the theories of these test methods.Then the test apparatus and development trends for UCS measurement are summarized,followed by a discussion on rock specimens for test apparatus,and data processing methods.Next,the method selection for UCS measurement is recommended.It reveals that the rock failure mechanism in the UCS testing methods can be divided into compression-shear,compression-tension,composite failure mode,and no obvious failure mode.The trends of these apparatuses are towards automation,digitization,precision,and multi-modal test.Two size correction methods are commonly used.One is to develop empirical correlation between the measured indices and the specimen size.The other is to use a standard specimen to calculate the size correction factor.Three to five input parameters are commonly utilized in soft computation models to predict the UCS of rocks.The selection of the test methods for the UCS measurement can be carried out according to the testing scenario and the specimen size.The engineers can gain a comprehensive understanding of the UCS testing methods and its potential developments in various rock engineering endeavors.展开更多
High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of su...High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of subsequent analyzes and applications,such as fault detection,predictive maintenance,and process optimization.However,for many industrial processes,obtaining sufficient high-quality data remains a significant challenge due to high costs,safety concerns,and practical constraints.To overcome these challenges,data augmentation has emerged as a rapidly growing research area,attracting considerable attention across both academia and industry.By expanding datasets,data augmentation techniques improve greater generalization and more robust performance in actual applications.This paper provides a comprehensive,multi-perspective review of data augmentation methods for industrial processes.For clarity and organization,existing studies are systematically grouped into four categories:small sample with low dimension,small sample with high dimension,large sample with low dimension,and large sample with high dimension.Within this framework,the review examines current research from both methodological and application-oriented perspectives,highlighting main methods,advantages,and limitations.By synthesizing these findings,this review offers a structured overview for scholars and practitioners,serving as a valuable reference for newcomers and experienced researchers seeking to explore and advance data augmentation techniques in industrial processes.展开更多
Snow cover in mountainous areas is characterized by high reflectivity,strong spatial heterogeneity,rapid changes,and susceptibility to cloud interference.However,due to the limitations of a single sensor,it is challen...Snow cover in mountainous areas is characterized by high reflectivity,strong spatial heterogeneity,rapid changes,and susceptibility to cloud interference.However,due to the limitations of a single sensor,it is challenging to obtain high-resolution satellite remote sensing data for monitoring the dynamic changes of snow cover within a day.This study focuses on two typical data fusion methods for polar-orbiting satellites(Sentinel-3 SLSTR)and geostationary satellites(Himawari-9 AHI),and explores the snow cover detection accuracy of a multitemporal cloud-gap snow cover identification model(Loose data fusion)and the ESTARFM(Spatiotemporal data fusion).Taking the Qilian Mountains as the research area,the accuracy of two data fusion results was verified using the snow cover extracted from Landsat-8 SR products.The results showed that both data fusion models could effectively capture the spatiotemporal variations of snow cover,but the ESTARFM demonstrated superior performance.It not only obtained fusion images at any target time,but also extracted snow cover that was closer to the spatial distribution of real satellite images.Therefore,the ESTARFM was utilized to fuse images for hourly reconstruction of the snow cover on February 14–15,2023.It was found that the maximum snow cover area of this snowfall reached 83.84%of the Qilian Mountains area,and the melting rate of the snow was extremely rapid,with a change of up to 4.30%per hour of the study area.This study offers reliable high spatiotemporal resolution satellite remote sensing data for monitoring snow cover changes in mountainous areas,contributing to more accurate and timely assessments.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
A three-dimensional(3D)electromagnetic(EM)inversion algorithm based on the nonlinear conjugate gradient(NLCG)method and a two-color plane Gauss-Seidel(GS)multigrid(MG)forward solver is developed to improve inversion e...A three-dimensional(3D)electromagnetic(EM)inversion algorithm based on the nonlinear conjugate gradient(NLCG)method and a two-color plane Gauss-Seidel(GS)multigrid(MG)forward solver is developed to improve inversion efficiency.The results indicate that the computational efficiency of each inversion can be improved by approximately a factor of three by using the proposed MG solver.First,the accuracy of the MG solver is validated through a test on a synthetic model.Next,the numerical performance of the inversion algorithm is evaluated using this model.Finally,the inversion algorithm is applied to a field EM data collected at the Beiya gold polymetallic ore district.A 3D resistivity model is obtained,and the formation process of the metal ore is analyzed.展开更多
Accurate acquisition and prediction of acoustic parameters of seabed sediments are crucial in marine sound propagation research.While the relationship between sound velocity and physical properties of sediment has bee...Accurate acquisition and prediction of acoustic parameters of seabed sediments are crucial in marine sound propagation research.While the relationship between sound velocity and physical properties of sediment has been extensively studied,there is still no consensus on the correlation between acoustic attenuation coefficient and sediment physical properties.Predicting the acoustic attenuation coefficient remains a challenging issue in sedimentary acoustic research.In this study,we propose a prediction method for the acoustic attenuation coefficient using machine learning algorithms,specifically the random forest(RF),support vector machine(SVR),and convolutional neural network(CNN)algorithms.We utilized the acoustic attenuation coefficient and sediment particle size data from 52 stations as training parameters,with the particle size parameters as the input feature matrix,and measured acoustic attenuation as the training label to validate the attenuation prediction model.Our results indicate that the error of the attenuation prediction model is small.Among the three models,the RF model exhibited the lowest prediction error,with a mean squared error of 0.8232,mean absolute error of 0.6613,and root mean squared error of 0.9073.Additionally,when we applied the models to predict the data collected at different times in the same region,we found that the models developed in this study also demonstrated a certain level of reliability in real prediction scenarios.Our approach demonstrates that constructing a sediment acoustic characteristics model based on machine learning is feasible to a certain extent and offers a novel perspective for studying sediment acoustic properties.展开更多
Seismic data plays a pivotal role in fault detection,offering critical insights into subsurface structures and seismic hazards.Understanding fault detection from seismic data is essential for mitigating seismic risks ...Seismic data plays a pivotal role in fault detection,offering critical insights into subsurface structures and seismic hazards.Understanding fault detection from seismic data is essential for mitigating seismic risks and guiding land-use plans.This paper presents a comprehensive review of existing methodologies for fault detection,focusing on the application of Machine Learning(ML)and Deep Learning(DL)techniques to enhance accuracy and efficiency.Various ML and DL approaches are analyzed with respect to fault segmentation,adaptive learning,and fault detection models.These techniques,benchmarked against established seismic datasets,reveal significant improvements over classical methods in terms of accuracy and computational efficiency.Additionally,this review highlights emerging trends,including hybrid model applications and the integration of real-time data processing for seismic fault detection.By providing a detailed comparative analysis of current methodologies,this review aims to guide future research and foster advancements in the effectiveness and reliability of seismic studies.Ultimately,the study seeks to bridge the gap between theoretical investigations and practical implementations in fault detection.展开更多
When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding bia...When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.展开更多
Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from sei...Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.展开更多
Accurate determination of rockhead is crucial for underground construction.Traditionally,borehole data are mainly used for this purpose.However,borehole drilling is costly,time-consuming,and sparsely distributed.Non-i...Accurate determination of rockhead is crucial for underground construction.Traditionally,borehole data are mainly used for this purpose.However,borehole drilling is costly,time-consuming,and sparsely distributed.Non-invasive geophysical methods,particularly those using passive seismic surface waves,have emerged as viable alternatives for geological profiling and rockhead detection.This study proposes three interpretation methods for rockhead determination using passive seismic surface wave data from Microtremor Array Measurement(MAM)and Horizontal-to-Vertical Spectral Ratio(HVSR)tests.These are:(1)the Wavelength-Normalized phase velocity(WN)method in which a nonlinear relationship between rockhead depth and wavelength is established;(2)the Statistically Determined-shear wave velocity(SD-V_(s))method in which the representative V_(s) value for rockhead is automatically determined using a statistical method;and(3)the empirical HVSR method in which the rockhead is determined by interpreting resonant frequencies using a reliably calibrated empirical equation.These methods were implemented to determine rockhead depths at 28 locations across two distinct geological formations in Singapore,and the results were evaluated using borehole data.The WN method can determine rockhead depths accurately and reliably with minimal absolute errors(average RMSE=3.11 m),demonstrating robust performance across both geological formations.Its advantage lies in interpreting dispersion curves alone,without the need for the inversion process.The SD-V_(s) method is practical in engineering practice owing to its simplicity.The empirical HVSR method reasonably determines rockhead depths with moderate accuracy,benefiting from a reliably calibrated empirical equation.展开更多
Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts...Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts of climate change.Remote sensing has become a vital tool for snow monitoring,with the widely used Moderate-resolution Imaging Spectroradiometer(MODIS)snow products from the Terra and Aqua satellites.However,cloud cover often interferes with snow detection,making cloud removal techniques crucial for reliable snow product generation.This study evaluated the accuracy of four MODIS snow cover datasets generated through different cloud removal algorithms.Using real-time field camera observations from four stations in the Tianshan Mountains,China,this study assessed the performance of these datasets during three distinct snow periods:the snow accumulation period(September-November),snowmelt period(March-June),and stable snow period(December-February in the following year).The findings showed that cloud-free snow products generated using the Hidden Markov Random Field(HMRF)algorithm consistently outperformed the others,particularly under cloud cover,while cloud-free snow products using near-day synthesis and the spatiotemporal adaptive fusion method with error correction(STAR)demonstrated varying performance depending on terrain complexity and cloud conditions.This study highlighted the importance of considering terrain features,land cover types,and snow dynamics when selecting cloud removal methods,particularly in areas with rapid snow accumulation and melting.The results suggested that future research should focus on improving cloud removal algorithms through the integration of machine learning,multi-source data fusion,and advanced remote sensing technologies.By expanding validation efforts and refining cloud removal strategies,more accurate and reliable snow products can be developed,contributing to enhanced snow monitoring and better management of water resources in alpine and arid areas.展开更多
This work contributes to the theoretical foundation for pricing in data markets and offers practical insights for managing digital data exchanges in the era of big data.We propose a structured pricing model for data e...This work contributes to the theoretical foundation for pricing in data markets and offers practical insights for managing digital data exchanges in the era of big data.We propose a structured pricing model for data exchanges transitioning from quasi-public to marketoriented operations.To address the complex dynamics among data exchanges,suppliers,and consumers,the authors develop a threestage Stackelberg game framework.In this model,the data exchange acts as a leader setting transaction commission rates,suppliers are intermediate leaders determining unit prices,and consumers are followers making purchasing decisions.Two pricing strategies are examined:the Independent Pricing Approach(IPA)and the novel Perfectly Competitive Pricing Approach(PCPA),which accounts for competition among data providers.Using backward induction,the study derives subgame-perfect equilibria and proves the existence and uniqueness of Stackelberg equilibria under both approaches.Extensive numerical simulations are carried out in the model,demonstrating that PCPA enhances data demander utility,encourages supplier competition,increases transaction volume,and improves the overall profitability and sustainability of data exchanges.Social welfare analysis further confirms PCPA’s superiority in promoting efficient and fair data markets.展开更多
Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potent...Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potential to process complex datasets and support decision-making in OC diagnosis.Nevertheless,traditional ML models tend to be biased,overfitting,noisy,and less generalized.Moreover,their black-box nature reduces interpretability and limits their practical clinical applicability.In this study,we introduce an explainable ensemble learning(EL)model,TreeX-Stack,based on a stacking architecture that employs tree-based learners such as Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),and Extreme Gradient Boosting(XGBoost)as base learners,and Logistic Regression(LR)as the meta-learner to enhance ovarian cancer(OC)diagnosis.Local Interpretable ModelAgnostic Explanations(LIME)are used to explain individual predictions,making the model outputs more clinically interpretable and applicable.The model is trained on the dataset that includes demographic information,blood test,general chemistry,and tumor markers.Extensive preprocessing includes handling missing data using iterative imputation with Bayesian Ridge and addressing multicollinearity by removing features with correlation coefficients above 0.7.Relevant features are then selected using the Boruta feature selection method.To obtain robust and unbiased performance estimates during hyperparameter tuning,nested cross-validation(CV)with grid search is employed,and all experiments are repeated five times to ensure statistical reliability.TreeX-Stack demonstrates excellent diagnostic performance,achieving an accuracy of 0.9027,a precision of 0.8673,a recall of 0.9391,and an F1-score of 0.9012.Feature-importance analyses using LIME and permutation importance highlight Human Epididymis Protein 4(HE4)as the most significant biomarker for OC.The combination of high predictive performance and interpretability makes TreeX-Stack a reliable tool for clinical decision support in OC diagnosis.展开更多
Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and...Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and gridded data)and actual climate data(fixed-point observations near the sampling sites),in northeastern China’s warm temperate zone and analyzed differences in their correlations with tree-ring width index.The results were:(1)Gridded temperature data,as well as precipitation and relative humidity data from the Huailai meteorological station,was more consistent with the actual climate data;in contrast,gridded soil moisture content data showed significant discrepancies.(2)Horizontal distance had a greater impact on the representativeness of actual climate conditions than vertical elevation differences.(3)Differences in consistency between alternative and actual climate data also affected their correlations with tree-ring width indices.In some growing season months,correlation coefficients,both in magnitude and sign,differed significantly from those based on actual data.The selection of different alternative climate datasets can lead to biased results in assessing forest responses to climate change,which is detrimental to the management of forest ecosystems in harsh environments.Therefore,the scientific and rational selection of alternative climate data is essential for dendroecological and climatological research.展开更多
To address the severe challenges of PM_(2.5) and ozone co-control during the"14^(th) Five-Year Plan"period and to enhance the precision and intelligence level of air environment governance,it is imperative t...To address the severe challenges of PM_(2.5) and ozone co-control during the"14^(th) Five-Year Plan"period and to enhance the precision and intelligence level of air environment governance,it is imperative to build an efficient comprehensive management platform for regional air quality.In this paper,the specific practice in Zibo City,Shandong Province is as an example to systematically analyze the top-level design,technical implementation,and innovative application of a comprehensive management platform for regional air quality integrating"perception monitoring,data fusion,research judgment of early warnings,analysis of sources,collaborative dispatching,and evaluation assessment".Through the construction of an"sky-air-ground"integrated three-dimensional monitoring network,the platform integrates multi-source heterogeneous environmental data,and employs big data,cloud computing,artificial intelligence,CALPUFF/CMAQ,and other numerical model technologies to achieve comprehensive perception,precise prediction,intelligent source tracing,and closed-loop management of air pollution.The platform innovatively establishes a full-process closed-loop management mechanism of"data-early warning-disposition-evaluation",and achieves a fundamental transformation from passive response to active anticipation and from experience-based judgment to data driving in environmental supervision.The application results show that this platform significantly improves the scientific decision-making ability and collaborative execution efficiency of air pollution governance in Zibo City,providing a replicable and scalable comprehensive solution for similar industrial cities to achieve the continuous improvement of air quality.展开更多
tRNA-derived small RNAs(tsRNAs),as a class of regulatory small noncoding RNA,have been implicated in a wide variety of human diseases.Large amounts of tsRNA–disease associations have been identified in recent years f...tRNA-derived small RNAs(tsRNAs),as a class of regulatory small noncoding RNA,have been implicated in a wide variety of human diseases.Large amounts of tsRNA–disease associations have been identified in recent years from accumulating studies.However,repositories for cataloging the detailed information on tsRNA–disease associations are scarce.In this study,we provide a tsRNADisease database by integrating experimentally and computationally supported tsRNA–disease associations from manual curation of literatures and other related resources.tsRNADisease contains 5571 manually curated associations between 4759 tsRNAs and 166 diseases with experimental evidence from 346 studies.In addition,it also contains 5013 predicted associations between 1297 tsRNAs and 111 diseases.tsRNADisease provides a user-friendly interface to browse,retrieve,and download data conveniently.This database can improve our understanding of tsRNA deregulation in diseases and serve as a valuable resource for investigating the mechanism of disease-related tsRNAs.tsRNADisease is freely available at http://www.compgenelab.info/tsRNADisease.展开更多
0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has...0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has largely been discipline-based,relying on field investigations,data collection,experimental analyses,and data interpretation to study individual components of the Earth system.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.52409151)the Programme of Shenzhen Key Laboratory of Green,Efficient and Intelligent Construction of Underground Metro Station(Programme No.ZDSYS20200923105200001)the Science and Technology Major Project of Xizang Autonomous Region of China(XZ202201ZD0003G).
文摘Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.
基金supported in part by the National Natural Science Foundation of China(NSFC)under Grants(52275471 and 52120105008)the Beijing Outstanding Young Scientist Program,and the New Cornerstone Science Foundation through the XPLORER PRIZE.
文摘As pivotal supporting technologies for smart manufacturing and digital engineering,model-based and data-driven methods have been widely applied in many industrial fields,such as product design,process monitoring,and smart maintenance.While promising,both methods have issues that need to be addressed.For example,model-based methods are limited by low computational accuracy and a high computational burden,and data-driven methods always suffer from poor interpretability and redundant features.To address these issues,the concept of data-model fusion(DMF)emerges as a promising solution.DMF involves integrating model-based methods with data-driven methods by incorporating big data into model-based methods or embedding relevant domain knowledge into data-driven methods.Despite growing efforts in the field of DMF,a unanimous definition of DMF remains elusive,and a general framework of DMF has been rarely discussed.This paper aims to address this gap by providing a thorough overview and categorization of both data-driven methods and model-based methods.Subsequently,this paper also presents the definition and categorization of DMF and discusses the general framework of DMF.Moreover,the primary seven applications of DMF are reviewed within the context of smart manufacturing and digital engineering.Finally,this paper directs the future directions of DMF.
文摘Data hiding methods involve embedding secret messages into cover objects to enable covert communication in a way that is difficult to detect.In data hiding methods based on image interpolation,the image size is reduced and then enlarged through interpolation,followed by the embedding of secret data into the newly generated pixels.A general improving approach for embedding secret messages is proposed.The approach may be regarded a general model for enhancing the data embedding capacity of various existing image interpolation-based data hiding methods.This enhancement is achieved by expanding the range of pixel values available for embedding secret messages,removing the limitations of many existing methods,where the range is restricted to powers of two to facilitate the direct embedding of bit-based messages.This improvement is accomplished through the application of multiple-based number conversion to the secret message data.The method converts the message bits into a multiple-based number and uses an algorithm to embed each digit of this number into an individual pixel,thereby enhancing the message embedding efficiency,as proved by a theorem derived in this study.The proposed improvement method has been tested through experiments on three well-known image interpolation-based data hiding methods.The results show that the proposed method can enhance the three data embedding rates by approximately 14%,13%,and 10%,respectively,create stego-images with good quality,and resist RS steganalysis attacks.These experimental results indicate that the use of the multiple-based number conversion technique to improve the three interpolation-based methods for embedding secret messages increases the number of message bits embedded in the images.For many image interpolation-based data hiding methods,which use power-of-two pixel-value ranges for message embedding,other than the three tested ones,the proposed improvement method is also expected to be effective for enhancing their data embedding capabilities.
基金the National Natural Science Foundation of China(Grant Nos.52308403 and 52079068)the Yunlong Lake Laboratory of Deep Underground Science and Engineering(No.104023005)the China Postdoctoral Science Foundation(Grant No.2023M731998)for funding provided to this work.
文摘The uniaxial compressive strength(UCS)of rocks is a vital geomechanical parameter widely used for rock mass classification,stability analysis,and engineering design in rock engineering.Various UCS testing methods and apparatuses have been proposed over the past few decades.The objective of the present study is to summarize the status and development in theories,test apparatuses,data processing of the existing testing methods for UCS measurement.It starts with elaborating the theories of these test methods.Then the test apparatus and development trends for UCS measurement are summarized,followed by a discussion on rock specimens for test apparatus,and data processing methods.Next,the method selection for UCS measurement is recommended.It reveals that the rock failure mechanism in the UCS testing methods can be divided into compression-shear,compression-tension,composite failure mode,and no obvious failure mode.The trends of these apparatuses are towards automation,digitization,precision,and multi-modal test.Two size correction methods are commonly used.One is to develop empirical correlation between the measured indices and the specimen size.The other is to use a standard specimen to calculate the size correction factor.Three to five input parameters are commonly utilized in soft computation models to predict the UCS of rocks.The selection of the test methods for the UCS measurement can be carried out according to the testing scenario and the specimen size.The engineers can gain a comprehensive understanding of the UCS testing methods and its potential developments in various rock engineering endeavors.
基金supported by the Postdoctoral Fellowship Program(Grade B)of China(GZB20250435)the National Natural Science Foundation of China(62403270).
文摘High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of subsequent analyzes and applications,such as fault detection,predictive maintenance,and process optimization.However,for many industrial processes,obtaining sufficient high-quality data remains a significant challenge due to high costs,safety concerns,and practical constraints.To overcome these challenges,data augmentation has emerged as a rapidly growing research area,attracting considerable attention across both academia and industry.By expanding datasets,data augmentation techniques improve greater generalization and more robust performance in actual applications.This paper provides a comprehensive,multi-perspective review of data augmentation methods for industrial processes.For clarity and organization,existing studies are systematically grouped into four categories:small sample with low dimension,small sample with high dimension,large sample with low dimension,and large sample with high dimension.Within this framework,the review examines current research from both methodological and application-oriented perspectives,highlighting main methods,advantages,and limitations.By synthesizing these findings,this review offers a structured overview for scholars and practitioners,serving as a valuable reference for newcomers and experienced researchers seeking to explore and advance data augmentation techniques in industrial processes.
基金funded by the National Natural Science Foundation of China(42361058)supported by the Science and Technology Program of Gansu Province(22YF7FA074)。
文摘Snow cover in mountainous areas is characterized by high reflectivity,strong spatial heterogeneity,rapid changes,and susceptibility to cloud interference.However,due to the limitations of a single sensor,it is challenging to obtain high-resolution satellite remote sensing data for monitoring the dynamic changes of snow cover within a day.This study focuses on two typical data fusion methods for polar-orbiting satellites(Sentinel-3 SLSTR)and geostationary satellites(Himawari-9 AHI),and explores the snow cover detection accuracy of a multitemporal cloud-gap snow cover identification model(Loose data fusion)and the ESTARFM(Spatiotemporal data fusion).Taking the Qilian Mountains as the research area,the accuracy of two data fusion results was verified using the snow cover extracted from Landsat-8 SR products.The results showed that both data fusion models could effectively capture the spatiotemporal variations of snow cover,but the ESTARFM demonstrated superior performance.It not only obtained fusion images at any target time,but also extracted snow cover that was closer to the spatial distribution of real satellite images.Therefore,the ESTARFM was utilized to fuse images for hourly reconstruction of the snow cover on February 14–15,2023.It was found that the maximum snow cover area of this snowfall reached 83.84%of the Qilian Mountains area,and the melting rate of the snow was extremely rapid,with a change of up to 4.30%per hour of the study area.This study offers reliable high spatiotemporal resolution satellite remote sensing data for monitoring snow cover changes in mountainous areas,contributing to more accurate and timely assessments.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金financially supported by the National Science and Technology Major Project,China(No.2024ZD1002100)the National Natural Science Foundation of China(Nos.42330801,42474112,42504062)+1 种基金the China Postdoctoral Science Foundation(No.2024M761704)Shuimu Tsinghua Scholar Program of Tsinghua University,China(No.2024SM114)。
文摘A three-dimensional(3D)electromagnetic(EM)inversion algorithm based on the nonlinear conjugate gradient(NLCG)method and a two-color plane Gauss-Seidel(GS)multigrid(MG)forward solver is developed to improve inversion efficiency.The results indicate that the computational efficiency of each inversion can be improved by approximately a factor of three by using the proposed MG solver.First,the accuracy of the MG solver is validated through a test on a synthetic model.Next,the numerical performance of the inversion algorithm is evaluated using this model.Finally,the inversion algorithm is applied to a field EM data collected at the Beiya gold polymetallic ore district.A 3D resistivity model is obtained,and the formation process of the metal ore is analyzed.
基金funded by the Basic Scientific Fund for National Public Research Institutes of China(No.2022 S01)the National Natural Science Foundation of China(Nos.42176191,42049902,and U22A2012)+5 种基金the Shandong Provincial Natural Science Foundation,China(No.ZR2022YQ40)the National Key R&D Program of China(No.2021YFF0501202)the Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)(No.SML2023 SP232)the Fundamental Research Funds for the Central Universities,Sun Yat-sen University(No.241gqb006)Data acquisition and sample collections were supported by the National Natural Science Foundation of China Open Research Cruise(Cruise No.NORC2021-02+NORC2021301)funded by the Shiptime Sharing Project of the National Natural Science Foundation of China。
文摘Accurate acquisition and prediction of acoustic parameters of seabed sediments are crucial in marine sound propagation research.While the relationship between sound velocity and physical properties of sediment has been extensively studied,there is still no consensus on the correlation between acoustic attenuation coefficient and sediment physical properties.Predicting the acoustic attenuation coefficient remains a challenging issue in sedimentary acoustic research.In this study,we propose a prediction method for the acoustic attenuation coefficient using machine learning algorithms,specifically the random forest(RF),support vector machine(SVR),and convolutional neural network(CNN)algorithms.We utilized the acoustic attenuation coefficient and sediment particle size data from 52 stations as training parameters,with the particle size parameters as the input feature matrix,and measured acoustic attenuation as the training label to validate the attenuation prediction model.Our results indicate that the error of the attenuation prediction model is small.Among the three models,the RF model exhibited the lowest prediction error,with a mean squared error of 0.8232,mean absolute error of 0.6613,and root mean squared error of 0.9073.Additionally,when we applied the models to predict the data collected at different times in the same region,we found that the models developed in this study also demonstrated a certain level of reliability in real prediction scenarios.Our approach demonstrates that constructing a sediment acoustic characteristics model based on machine learning is feasible to a certain extent and offers a novel perspective for studying sediment acoustic properties.
文摘Seismic data plays a pivotal role in fault detection,offering critical insights into subsurface structures and seismic hazards.Understanding fault detection from seismic data is essential for mitigating seismic risks and guiding land-use plans.This paper presents a comprehensive review of existing methodologies for fault detection,focusing on the application of Machine Learning(ML)and Deep Learning(DL)techniques to enhance accuracy and efficiency.Various ML and DL approaches are analyzed with respect to fault segmentation,adaptive learning,and fault detection models.These techniques,benchmarked against established seismic datasets,reveal significant improvements over classical methods in terms of accuracy and computational efficiency.Additionally,this review highlights emerging trends,including hybrid model applications and the integration of real-time data processing for seismic fault detection.By providing a detailed comparative analysis of current methodologies,this review aims to guide future research and foster advancements in the effectiveness and reliability of seismic studies.Ultimately,the study seeks to bridge the gap between theoretical investigations and practical implementations in fault detection.
文摘When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.
文摘Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.
基金partially supported by the Singapore Ministry of National Development and the National Research Foundation,Prime Minister’s Office,Singapore,under the Land and Liveability National Innovation Challenge(L2 NIC)Research Program(Grant No.L2NICCFP2-2015-1)by the National Research Foundation(NRF)of Singapore,under the Virtual Singapore program(Grant No.NRF2019VSG-GMS-001).
文摘Accurate determination of rockhead is crucial for underground construction.Traditionally,borehole data are mainly used for this purpose.However,borehole drilling is costly,time-consuming,and sparsely distributed.Non-invasive geophysical methods,particularly those using passive seismic surface waves,have emerged as viable alternatives for geological profiling and rockhead detection.This study proposes three interpretation methods for rockhead determination using passive seismic surface wave data from Microtremor Array Measurement(MAM)and Horizontal-to-Vertical Spectral Ratio(HVSR)tests.These are:(1)the Wavelength-Normalized phase velocity(WN)method in which a nonlinear relationship between rockhead depth and wavelength is established;(2)the Statistically Determined-shear wave velocity(SD-V_(s))method in which the representative V_(s) value for rockhead is automatically determined using a statistical method;and(3)the empirical HVSR method in which the rockhead is determined by interpreting resonant frequencies using a reliably calibrated empirical equation.These methods were implemented to determine rockhead depths at 28 locations across two distinct geological formations in Singapore,and the results were evaluated using borehole data.The WN method can determine rockhead depths accurately and reliably with minimal absolute errors(average RMSE=3.11 m),demonstrating robust performance across both geological formations.Its advantage lies in interpreting dispersion curves alone,without the need for the inversion process.The SD-V_(s) method is practical in engineering practice owing to its simplicity.The empirical HVSR method reasonably determines rockhead depths with moderate accuracy,benefiting from a reliably calibrated empirical equation.
基金funded by the Third Xinjiang Scientific Expedition Program(2021xjkk1400)the National Natural Science Foundation of China(42071049)+2 种基金the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2019D01C022)the Xinjiang Uygur Autonomous Region Innovation Environment Construction Special Project&Science and Technology Innovation Base Construction Project(PT2107)the Tianshan Talent-Science and Technology Innovation Team(2022TSYCTD0006).
文摘Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts of climate change.Remote sensing has become a vital tool for snow monitoring,with the widely used Moderate-resolution Imaging Spectroradiometer(MODIS)snow products from the Terra and Aqua satellites.However,cloud cover often interferes with snow detection,making cloud removal techniques crucial for reliable snow product generation.This study evaluated the accuracy of four MODIS snow cover datasets generated through different cloud removal algorithms.Using real-time field camera observations from four stations in the Tianshan Mountains,China,this study assessed the performance of these datasets during three distinct snow periods:the snow accumulation period(September-November),snowmelt period(March-June),and stable snow period(December-February in the following year).The findings showed that cloud-free snow products generated using the Hidden Markov Random Field(HMRF)algorithm consistently outperformed the others,particularly under cloud cover,while cloud-free snow products using near-day synthesis and the spatiotemporal adaptive fusion method with error correction(STAR)demonstrated varying performance depending on terrain complexity and cloud conditions.This study highlighted the importance of considering terrain features,land cover types,and snow dynamics when selecting cloud removal methods,particularly in areas with rapid snow accumulation and melting.The results suggested that future research should focus on improving cloud removal algorithms through the integration of machine learning,multi-source data fusion,and advanced remote sensing technologies.By expanding validation efforts and refining cloud removal strategies,more accurate and reliable snow products can be developed,contributing to enhanced snow monitoring and better management of water resources in alpine and arid areas.
基金supported by the National Natural Science Foundation of China[grant numbers 12171158,12371474 and 12571510]Fundamental Research Funds for the Central Universities[grant number 2025ECNU-WLJC006].
文摘This work contributes to the theoretical foundation for pricing in data markets and offers practical insights for managing digital data exchanges in the era of big data.We propose a structured pricing model for data exchanges transitioning from quasi-public to marketoriented operations.To address the complex dynamics among data exchanges,suppliers,and consumers,the authors develop a threestage Stackelberg game framework.In this model,the data exchange acts as a leader setting transaction commission rates,suppliers are intermediate leaders determining unit prices,and consumers are followers making purchasing decisions.Two pricing strategies are examined:the Independent Pricing Approach(IPA)and the novel Perfectly Competitive Pricing Approach(PCPA),which accounts for competition among data providers.Using backward induction,the study derives subgame-perfect equilibria and proves the existence and uniqueness of Stackelberg equilibria under both approaches.Extensive numerical simulations are carried out in the model,demonstrating that PCPA enhances data demander utility,encourages supplier competition,increases transaction volume,and improves the overall profitability and sustainability of data exchanges.Social welfare analysis further confirms PCPA’s superiority in promoting efficient and fair data markets.
基金supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)under the grant number IMSIU-DDRSP2601.
文摘Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potential to process complex datasets and support decision-making in OC diagnosis.Nevertheless,traditional ML models tend to be biased,overfitting,noisy,and less generalized.Moreover,their black-box nature reduces interpretability and limits their practical clinical applicability.In this study,we introduce an explainable ensemble learning(EL)model,TreeX-Stack,based on a stacking architecture that employs tree-based learners such as Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),and Extreme Gradient Boosting(XGBoost)as base learners,and Logistic Regression(LR)as the meta-learner to enhance ovarian cancer(OC)diagnosis.Local Interpretable ModelAgnostic Explanations(LIME)are used to explain individual predictions,making the model outputs more clinically interpretable and applicable.The model is trained on the dataset that includes demographic information,blood test,general chemistry,and tumor markers.Extensive preprocessing includes handling missing data using iterative imputation with Bayesian Ridge and addressing multicollinearity by removing features with correlation coefficients above 0.7.Relevant features are then selected using the Boruta feature selection method.To obtain robust and unbiased performance estimates during hyperparameter tuning,nested cross-validation(CV)with grid search is employed,and all experiments are repeated five times to ensure statistical reliability.TreeX-Stack demonstrates excellent diagnostic performance,achieving an accuracy of 0.9027,a precision of 0.8673,a recall of 0.9391,and an F1-score of 0.9012.Feature-importance analyses using LIME and permutation importance highlight Human Epididymis Protein 4(HE4)as the most significant biomarker for OC.The combination of high predictive performance and interpretability makes TreeX-Stack a reliable tool for clinical decision support in OC diagnosis.
基金supported by the International Partnership program of the Chinese Academy of Sciences(170GJHZ2023074GC)National Natural Science Foundation of China(42425706 and 42488201)+1 种基金National Key Research and Development Program of China(2024YFF0807902)Beijing Natural Science Foundation(8242041),and China Postdoctoral Science Foundation(2025M770353).
文摘Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and gridded data)and actual climate data(fixed-point observations near the sampling sites),in northeastern China’s warm temperate zone and analyzed differences in their correlations with tree-ring width index.The results were:(1)Gridded temperature data,as well as precipitation and relative humidity data from the Huailai meteorological station,was more consistent with the actual climate data;in contrast,gridded soil moisture content data showed significant discrepancies.(2)Horizontal distance had a greater impact on the representativeness of actual climate conditions than vertical elevation differences.(3)Differences in consistency between alternative and actual climate data also affected their correlations with tree-ring width indices.In some growing season months,correlation coefficients,both in magnitude and sign,differed significantly from those based on actual data.The selection of different alternative climate datasets can lead to biased results in assessing forest responses to climate change,which is detrimental to the management of forest ecosystems in harsh environments.Therefore,the scientific and rational selection of alternative climate data is essential for dendroecological and climatological research.
文摘To address the severe challenges of PM_(2.5) and ozone co-control during the"14^(th) Five-Year Plan"period and to enhance the precision and intelligence level of air environment governance,it is imperative to build an efficient comprehensive management platform for regional air quality.In this paper,the specific practice in Zibo City,Shandong Province is as an example to systematically analyze the top-level design,technical implementation,and innovative application of a comprehensive management platform for regional air quality integrating"perception monitoring,data fusion,research judgment of early warnings,analysis of sources,collaborative dispatching,and evaluation assessment".Through the construction of an"sky-air-ground"integrated three-dimensional monitoring network,the platform integrates multi-source heterogeneous environmental data,and employs big data,cloud computing,artificial intelligence,CALPUFF/CMAQ,and other numerical model technologies to achieve comprehensive perception,precise prediction,intelligent source tracing,and closed-loop management of air pollution.The platform innovatively establishes a full-process closed-loop management mechanism of"data-early warning-disposition-evaluation",and achieves a fundamental transformation from passive response to active anticipation and from experience-based judgment to data driving in environmental supervision.The application results show that this platform significantly improves the scientific decision-making ability and collaborative execution efficiency of air pollution governance in Zibo City,providing a replicable and scalable comprehensive solution for similar industrial cities to achieve the continuous improvement of air quality.
基金supported by the National Natural Science Foundation of China(91959106)the Foundation of the Shanghai Municipal Education Commission(24RGZNC02)+4 种基金Shanghai Key Laboratory of Intelligent Information Processing,Fudan University(IIPL-2025-RD3-02)Key University Science Research Project of Anhui Province(2023AH030108)Climbing Peak Training Program for Innovative Technology team of Yijishan Hospital,Wannan Medical College(PF201904)Peak Training Program for Scientific Research of Yijishan Hospital,Wannan Medical College(GF2019G15)the talent project of the First Affiliated Hospital of Wannan Medical College(Yijishan Hospital of Wannan Medical College)(YR202422).
文摘tRNA-derived small RNAs(tsRNAs),as a class of regulatory small noncoding RNA,have been implicated in a wide variety of human diseases.Large amounts of tsRNA–disease associations have been identified in recent years from accumulating studies.However,repositories for cataloging the detailed information on tsRNA–disease associations are scarce.In this study,we provide a tsRNADisease database by integrating experimentally and computationally supported tsRNA–disease associations from manual curation of literatures and other related resources.tsRNADisease contains 5571 manually curated associations between 4759 tsRNAs and 166 diseases with experimental evidence from 346 studies.In addition,it also contains 5013 predicted associations between 1297 tsRNAs and 111 diseases.tsRNADisease provides a user-friendly interface to browse,retrieve,and download data conveniently.This database can improve our understanding of tsRNA deregulation in diseases and serve as a valuable resource for investigating the mechanism of disease-related tsRNAs.tsRNADisease is freely available at http://www.compgenelab.info/tsRNADisease.
基金supported by National Key R&D Program of China(No.2021YFF0501301)the National Natural Science Foundation of China(No.42172231)。
文摘0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has largely been discipline-based,relying on field investigations,data collection,experimental analyses,and data interpretation to study individual components of the Earth system.