Data hiding methods involve embedding secret messages into cover objects to enable covert communication in a way that is difficult to detect.In data hiding methods based on image interpolation,the image size is reduce...Data hiding methods involve embedding secret messages into cover objects to enable covert communication in a way that is difficult to detect.In data hiding methods based on image interpolation,the image size is reduced and then enlarged through interpolation,followed by the embedding of secret data into the newly generated pixels.A general improving approach for embedding secret messages is proposed.The approach may be regarded a general model for enhancing the data embedding capacity of various existing image interpolation-based data hiding methods.This enhancement is achieved by expanding the range of pixel values available for embedding secret messages,removing the limitations of many existing methods,where the range is restricted to powers of two to facilitate the direct embedding of bit-based messages.This improvement is accomplished through the application of multiple-based number conversion to the secret message data.The method converts the message bits into a multiple-based number and uses an algorithm to embed each digit of this number into an individual pixel,thereby enhancing the message embedding efficiency,as proved by a theorem derived in this study.The proposed improvement method has been tested through experiments on three well-known image interpolation-based data hiding methods.The results show that the proposed method can enhance the three data embedding rates by approximately 14%,13%,and 10%,respectively,create stego-images with good quality,and resist RS steganalysis attacks.These experimental results indicate that the use of the multiple-based number conversion technique to improve the three interpolation-based methods for embedding secret messages increases the number of message bits embedded in the images.For many image interpolation-based data hiding methods,which use power-of-two pixel-value ranges for message embedding,other than the three tested ones,the proposed improvement method is also expected to be effective for enhancing their data embedding capabilities.展开更多
The uniaxial compressive strength(UCS)of rocks is a vital geomechanical parameter widely used for rock mass classification,stability analysis,and engineering design in rock engineering.Various UCS testing methods and ...The uniaxial compressive strength(UCS)of rocks is a vital geomechanical parameter widely used for rock mass classification,stability analysis,and engineering design in rock engineering.Various UCS testing methods and apparatuses have been proposed over the past few decades.The objective of the present study is to summarize the status and development in theories,test apparatuses,data processing of the existing testing methods for UCS measurement.It starts with elaborating the theories of these test methods.Then the test apparatus and development trends for UCS measurement are summarized,followed by a discussion on rock specimens for test apparatus,and data processing methods.Next,the method selection for UCS measurement is recommended.It reveals that the rock failure mechanism in the UCS testing methods can be divided into compression-shear,compression-tension,composite failure mode,and no obvious failure mode.The trends of these apparatuses are towards automation,digitization,precision,and multi-modal test.Two size correction methods are commonly used.One is to develop empirical correlation between the measured indices and the specimen size.The other is to use a standard specimen to calculate the size correction factor.Three to five input parameters are commonly utilized in soft computation models to predict the UCS of rocks.The selection of the test methods for the UCS measurement can be carried out according to the testing scenario and the specimen size.The engineers can gain a comprehensive understanding of the UCS testing methods and its potential developments in various rock engineering endeavors.展开更多
High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of su...High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of subsequent analyzes and applications,such as fault detection,predictive maintenance,and process optimization.However,for many industrial processes,obtaining sufficient high-quality data remains a significant challenge due to high costs,safety concerns,and practical constraints.To overcome these challenges,data augmentation has emerged as a rapidly growing research area,attracting considerable attention across both academia and industry.By expanding datasets,data augmentation techniques improve greater generalization and more robust performance in actual applications.This paper provides a comprehensive,multi-perspective review of data augmentation methods for industrial processes.For clarity and organization,existing studies are systematically grouped into four categories:small sample with low dimension,small sample with high dimension,large sample with low dimension,and large sample with high dimension.Within this framework,the review examines current research from both methodological and application-oriented perspectives,highlighting main methods,advantages,and limitations.By synthesizing these findings,this review offers a structured overview for scholars and practitioners,serving as a valuable reference for newcomers and experienced researchers seeking to explore and advance data augmentation techniques in industrial processes.展开更多
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s...Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.展开更多
As pivotal supporting technologies for smart manufacturing and digital engineering,model-based and data-driven methods have been widely applied in many industrial fields,such as product design,process monitoring,and s...As pivotal supporting technologies for smart manufacturing and digital engineering,model-based and data-driven methods have been widely applied in many industrial fields,such as product design,process monitoring,and smart maintenance.While promising,both methods have issues that need to be addressed.For example,model-based methods are limited by low computational accuracy and a high computational burden,and data-driven methods always suffer from poor interpretability and redundant features.To address these issues,the concept of data-model fusion(DMF)emerges as a promising solution.DMF involves integrating model-based methods with data-driven methods by incorporating big data into model-based methods or embedding relevant domain knowledge into data-driven methods.Despite growing efforts in the field of DMF,a unanimous definition of DMF remains elusive,and a general framework of DMF has been rarely discussed.This paper aims to address this gap by providing a thorough overview and categorization of both data-driven methods and model-based methods.Subsequently,this paper also presents the definition and categorization of DMF and discusses the general framework of DMF.Moreover,the primary seven applications of DMF are reviewed within the context of smart manufacturing and digital engineering.Finally,this paper directs the future directions of DMF.展开更多
Snow cover in mountainous areas is characterized by high reflectivity,strong spatial heterogeneity,rapid changes,and susceptibility to cloud interference.However,due to the limitations of a single sensor,it is challen...Snow cover in mountainous areas is characterized by high reflectivity,strong spatial heterogeneity,rapid changes,and susceptibility to cloud interference.However,due to the limitations of a single sensor,it is challenging to obtain high-resolution satellite remote sensing data for monitoring the dynamic changes of snow cover within a day.This study focuses on two typical data fusion methods for polar-orbiting satellites(Sentinel-3 SLSTR)and geostationary satellites(Himawari-9 AHI),and explores the snow cover detection accuracy of a multitemporal cloud-gap snow cover identification model(Loose data fusion)and the ESTARFM(Spatiotemporal data fusion).Taking the Qilian Mountains as the research area,the accuracy of two data fusion results was verified using the snow cover extracted from Landsat-8 SR products.The results showed that both data fusion models could effectively capture the spatiotemporal variations of snow cover,but the ESTARFM demonstrated superior performance.It not only obtained fusion images at any target time,but also extracted snow cover that was closer to the spatial distribution of real satellite images.Therefore,the ESTARFM was utilized to fuse images for hourly reconstruction of the snow cover on February 14–15,2023.It was found that the maximum snow cover area of this snowfall reached 83.84%of the Qilian Mountains area,and the melting rate of the snow was extremely rapid,with a change of up to 4.30%per hour of the study area.This study offers reliable high spatiotemporal resolution satellite remote sensing data for monitoring snow cover changes in mountainous areas,contributing to more accurate and timely assessments.展开更多
Accurate acquisition and prediction of acoustic parameters of seabed sediments are crucial in marine sound propagation research.While the relationship between sound velocity and physical properties of sediment has bee...Accurate acquisition and prediction of acoustic parameters of seabed sediments are crucial in marine sound propagation research.While the relationship between sound velocity and physical properties of sediment has been extensively studied,there is still no consensus on the correlation between acoustic attenuation coefficient and sediment physical properties.Predicting the acoustic attenuation coefficient remains a challenging issue in sedimentary acoustic research.In this study,we propose a prediction method for the acoustic attenuation coefficient using machine learning algorithms,specifically the random forest(RF),support vector machine(SVR),and convolutional neural network(CNN)algorithms.We utilized the acoustic attenuation coefficient and sediment particle size data from 52 stations as training parameters,with the particle size parameters as the input feature matrix,and measured acoustic attenuation as the training label to validate the attenuation prediction model.Our results indicate that the error of the attenuation prediction model is small.Among the three models,the RF model exhibited the lowest prediction error,with a mean squared error of 0.8232,mean absolute error of 0.6613,and root mean squared error of 0.9073.Additionally,when we applied the models to predict the data collected at different times in the same region,we found that the models developed in this study also demonstrated a certain level of reliability in real prediction scenarios.Our approach demonstrates that constructing a sediment acoustic characteristics model based on machine learning is feasible to a certain extent and offers a novel perspective for studying sediment acoustic properties.展开更多
Seismic data plays a pivotal role in fault detection,offering critical insights into subsurface structures and seismic hazards.Understanding fault detection from seismic data is essential for mitigating seismic risks ...Seismic data plays a pivotal role in fault detection,offering critical insights into subsurface structures and seismic hazards.Understanding fault detection from seismic data is essential for mitigating seismic risks and guiding land-use plans.This paper presents a comprehensive review of existing methodologies for fault detection,focusing on the application of Machine Learning(ML)and Deep Learning(DL)techniques to enhance accuracy and efficiency.Various ML and DL approaches are analyzed with respect to fault segmentation,adaptive learning,and fault detection models.These techniques,benchmarked against established seismic datasets,reveal significant improvements over classical methods in terms of accuracy and computational efficiency.Additionally,this review highlights emerging trends,including hybrid model applications and the integration of real-time data processing for seismic fault detection.By providing a detailed comparative analysis of current methodologies,this review aims to guide future research and foster advancements in the effectiveness and reliability of seismic studies.Ultimately,the study seeks to bridge the gap between theoretical investigations and practical implementations in fault detection.展开更多
When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding bia...When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.展开更多
Accurate determination of rockhead is crucial for underground construction.Traditionally,borehole data are mainly used for this purpose.However,borehole drilling is costly,time-consuming,and sparsely distributed.Non-i...Accurate determination of rockhead is crucial for underground construction.Traditionally,borehole data are mainly used for this purpose.However,borehole drilling is costly,time-consuming,and sparsely distributed.Non-invasive geophysical methods,particularly those using passive seismic surface waves,have emerged as viable alternatives for geological profiling and rockhead detection.This study proposes three interpretation methods for rockhead determination using passive seismic surface wave data from Microtremor Array Measurement(MAM)and Horizontal-to-Vertical Spectral Ratio(HVSR)tests.These are:(1)the Wavelength-Normalized phase velocity(WN)method in which a nonlinear relationship between rockhead depth and wavelength is established;(2)the Statistically Determined-shear wave velocity(SD-V_(s))method in which the representative V_(s) value for rockhead is automatically determined using a statistical method;and(3)the empirical HVSR method in which the rockhead is determined by interpreting resonant frequencies using a reliably calibrated empirical equation.These methods were implemented to determine rockhead depths at 28 locations across two distinct geological formations in Singapore,and the results were evaluated using borehole data.The WN method can determine rockhead depths accurately and reliably with minimal absolute errors(average RMSE=3.11 m),demonstrating robust performance across both geological formations.Its advantage lies in interpreting dispersion curves alone,without the need for the inversion process.The SD-V_(s) method is practical in engineering practice owing to its simplicity.The empirical HVSR method reasonably determines rockhead depths with moderate accuracy,benefiting from a reliably calibrated empirical equation.展开更多
Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts...Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts of climate change.Remote sensing has become a vital tool for snow monitoring,with the widely used Moderate-resolution Imaging Spectroradiometer(MODIS)snow products from the Terra and Aqua satellites.However,cloud cover often interferes with snow detection,making cloud removal techniques crucial for reliable snow product generation.This study evaluated the accuracy of four MODIS snow cover datasets generated through different cloud removal algorithms.Using real-time field camera observations from four stations in the Tianshan Mountains,China,this study assessed the performance of these datasets during three distinct snow periods:the snow accumulation period(September-November),snowmelt period(March-June),and stable snow period(December-February in the following year).The findings showed that cloud-free snow products generated using the Hidden Markov Random Field(HMRF)algorithm consistently outperformed the others,particularly under cloud cover,while cloud-free snow products using near-day synthesis and the spatiotemporal adaptive fusion method with error correction(STAR)demonstrated varying performance depending on terrain complexity and cloud conditions.This study highlighted the importance of considering terrain features,land cover types,and snow dynamics when selecting cloud removal methods,particularly in areas with rapid snow accumulation and melting.The results suggested that future research should focus on improving cloud removal algorithms through the integration of machine learning,multi-source data fusion,and advanced remote sensing technologies.By expanding validation efforts and refining cloud removal strategies,more accurate and reliable snow products can be developed,contributing to enhanced snow monitoring and better management of water resources in alpine and arid areas.展开更多
The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging ...High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).展开更多
Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including ...Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.展开更多
Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for ...Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for anti-influenza were collected and recorded in the database, and then the correlation coefficient between herbs, core combinations of herbs and new prescriptions were analyzed by using modified mutual information, complex system entropy cluster and unsupervised hierarchical clustering, respectively. Results: Based on analysis of 126 Chinese patent medicine recipes, the frequency of each herb occurrence in these recipes, 54 frequently-used herb pairs, 34 core combinations were determined, and 4 new recipes for influenza were developed. Conclusion: Unsupervised data mining methods are able to mine the component law quickly and develop new prescriptions.展开更多
Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.I...Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods.展开更多
In this study, we have used four methods to investigate the start of the growing season (SGS) on the Tibetan Plateau (TP) from 1982 to 2012, using Normalized Difference Vegetation Index (NDVI) data obtained from...In this study, we have used four methods to investigate the start of the growing season (SGS) on the Tibetan Plateau (TP) from 1982 to 2012, using Normalized Difference Vegetation Index (NDVI) data obtained from Global Inventory Modeling and Mapping Studies (GIMSS, 1982-2006) and SPOT VEGETATION (SPOT-VGT, 1999-2012). SGS values esti- mated using the four methods show similar spatial patterns along latitudinal or altitudinal gradients, but with significant variations in the SGS dates. The largest discrepancies are mainly found in the regions with the highest or the lowest vegetation coverage. Between 1982 and 1998, the SGS values derived from the four methods all display an advancing trend, however, according to the more recent SPOT VGT data (1999-2012), there is no continu- ously advancing trend of SGS on the TP. Analysis of the correlation between the SGS values derived from GIMMS and SPOT between 1999 and 2006 demonstrates consistency in the tendency with regard both to the data sources and to the four analysis methods used. Com- pared with other methods, the greatest consistency between the in situ data and the SGS values retrieved is obtained with Method 3 (Threshold of NDVI ratio). To avoid error, in a vast region with diverse vegetation types and physical environments, it is critical to know the seasonal change characteristics of the different vegetation types, particularly in areas with sparse grassland or evergreen forest.展开更多
Materials development has historically been driven by human needs and desires, and this is likely to con- tinue in the foreseeable future. The global population is expected to reach ten billion by 2050, which will pro...Materials development has historically been driven by human needs and desires, and this is likely to con- tinue in the foreseeable future. The global population is expected to reach ten billion by 2050, which will promote increasingly large demands for clean and high-ef ciency energy, personalized consumer prod- ucts, secure food supplies, and professional healthcare. New functional materials that are made and tai- lored for targeted properties or behaviors will be the key to tackling this challenge. Traditionally, advanced materials are found empirically or through experimental trial-and-error approaches. As big data generated by modern experimental and computational techniques is becoming more readily avail- able, data-driven or machine learning (ML) methods have opened new paradigms for the discovery and rational design of materials. In this review article, we provide a brief introduction on various ML methods and related software or tools. Main ideas and basic procedures for employing ML approaches in materials research are highlighted. We then summarize recent important applications of ML for the large-scale screening and optimal design of polymer and porous materials, catalytic materials, and energetic mate- rials. Finally, concluding remarks and an outlook are provided.展开更多
The High Precision Magnetometer(HPM) on board the China Seismo-Electromagnetic Satellite(CSES) allows highly accurate measurement of the geomagnetic field; it includes FGM(Fluxgate Magnetometer) and CDSM(Coupled Dark ...The High Precision Magnetometer(HPM) on board the China Seismo-Electromagnetic Satellite(CSES) allows highly accurate measurement of the geomagnetic field; it includes FGM(Fluxgate Magnetometer) and CDSM(Coupled Dark State Magnetometer)probes. This article introduces the main processing method, algorithm, and processing procedure of the HPM data. First, the FGM and CDSM probes are calibrated according to ground sensor data. Then the FGM linear parameters can be corrected in orbit, by applying the absolute vector magnetic field correction algorithm from CDSM data. At the same time, the magnetic interference of the satellite is eliminated according to ground-satellite magnetic test results. Finally, according to the characteristics of the magnetic field direction in the low latitude region, the transformation matrix between FGM probe and star sensor is calibrated in orbit to determine the correct direction of the magnetic field. Comparing the magnetic field data of CSES and SWARM satellites in five continuous geomagnetic quiet days, the difference in measurements of the vector magnetic field is about 10 nT, which is within the uncertainty interval of geomagnetic disturbance.展开更多
文摘Data hiding methods involve embedding secret messages into cover objects to enable covert communication in a way that is difficult to detect.In data hiding methods based on image interpolation,the image size is reduced and then enlarged through interpolation,followed by the embedding of secret data into the newly generated pixels.A general improving approach for embedding secret messages is proposed.The approach may be regarded a general model for enhancing the data embedding capacity of various existing image interpolation-based data hiding methods.This enhancement is achieved by expanding the range of pixel values available for embedding secret messages,removing the limitations of many existing methods,where the range is restricted to powers of two to facilitate the direct embedding of bit-based messages.This improvement is accomplished through the application of multiple-based number conversion to the secret message data.The method converts the message bits into a multiple-based number and uses an algorithm to embed each digit of this number into an individual pixel,thereby enhancing the message embedding efficiency,as proved by a theorem derived in this study.The proposed improvement method has been tested through experiments on three well-known image interpolation-based data hiding methods.The results show that the proposed method can enhance the three data embedding rates by approximately 14%,13%,and 10%,respectively,create stego-images with good quality,and resist RS steganalysis attacks.These experimental results indicate that the use of the multiple-based number conversion technique to improve the three interpolation-based methods for embedding secret messages increases the number of message bits embedded in the images.For many image interpolation-based data hiding methods,which use power-of-two pixel-value ranges for message embedding,other than the three tested ones,the proposed improvement method is also expected to be effective for enhancing their data embedding capabilities.
基金the National Natural Science Foundation of China(Grant Nos.52308403 and 52079068)the Yunlong Lake Laboratory of Deep Underground Science and Engineering(No.104023005)the China Postdoctoral Science Foundation(Grant No.2023M731998)for funding provided to this work.
文摘The uniaxial compressive strength(UCS)of rocks is a vital geomechanical parameter widely used for rock mass classification,stability analysis,and engineering design in rock engineering.Various UCS testing methods and apparatuses have been proposed over the past few decades.The objective of the present study is to summarize the status and development in theories,test apparatuses,data processing of the existing testing methods for UCS measurement.It starts with elaborating the theories of these test methods.Then the test apparatus and development trends for UCS measurement are summarized,followed by a discussion on rock specimens for test apparatus,and data processing methods.Next,the method selection for UCS measurement is recommended.It reveals that the rock failure mechanism in the UCS testing methods can be divided into compression-shear,compression-tension,composite failure mode,and no obvious failure mode.The trends of these apparatuses are towards automation,digitization,precision,and multi-modal test.Two size correction methods are commonly used.One is to develop empirical correlation between the measured indices and the specimen size.The other is to use a standard specimen to calculate the size correction factor.Three to five input parameters are commonly utilized in soft computation models to predict the UCS of rocks.The selection of the test methods for the UCS measurement can be carried out according to the testing scenario and the specimen size.The engineers can gain a comprehensive understanding of the UCS testing methods and its potential developments in various rock engineering endeavors.
基金supported by the Postdoctoral Fellowship Program(Grade B)of China(GZB20250435)the National Natural Science Foundation of China(62403270).
文摘High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of subsequent analyzes and applications,such as fault detection,predictive maintenance,and process optimization.However,for many industrial processes,obtaining sufficient high-quality data remains a significant challenge due to high costs,safety concerns,and practical constraints.To overcome these challenges,data augmentation has emerged as a rapidly growing research area,attracting considerable attention across both academia and industry.By expanding datasets,data augmentation techniques improve greater generalization and more robust performance in actual applications.This paper provides a comprehensive,multi-perspective review of data augmentation methods for industrial processes.For clarity and organization,existing studies are systematically grouped into four categories:small sample with low dimension,small sample with high dimension,large sample with low dimension,and large sample with high dimension.Within this framework,the review examines current research from both methodological and application-oriented perspectives,highlighting main methods,advantages,and limitations.By synthesizing these findings,this review offers a structured overview for scholars and practitioners,serving as a valuable reference for newcomers and experienced researchers seeking to explore and advance data augmentation techniques in industrial processes.
基金supported by the National Natural Science Foundation of China(Grant No.52409151)the Programme of Shenzhen Key Laboratory of Green,Efficient and Intelligent Construction of Underground Metro Station(Programme No.ZDSYS20200923105200001)the Science and Technology Major Project of Xizang Autonomous Region of China(XZ202201ZD0003G).
文摘Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.
基金supported in part by the National Natural Science Foundation of China(NSFC)under Grants(52275471 and 52120105008)the Beijing Outstanding Young Scientist Program,and the New Cornerstone Science Foundation through the XPLORER PRIZE.
文摘As pivotal supporting technologies for smart manufacturing and digital engineering,model-based and data-driven methods have been widely applied in many industrial fields,such as product design,process monitoring,and smart maintenance.While promising,both methods have issues that need to be addressed.For example,model-based methods are limited by low computational accuracy and a high computational burden,and data-driven methods always suffer from poor interpretability and redundant features.To address these issues,the concept of data-model fusion(DMF)emerges as a promising solution.DMF involves integrating model-based methods with data-driven methods by incorporating big data into model-based methods or embedding relevant domain knowledge into data-driven methods.Despite growing efforts in the field of DMF,a unanimous definition of DMF remains elusive,and a general framework of DMF has been rarely discussed.This paper aims to address this gap by providing a thorough overview and categorization of both data-driven methods and model-based methods.Subsequently,this paper also presents the definition and categorization of DMF and discusses the general framework of DMF.Moreover,the primary seven applications of DMF are reviewed within the context of smart manufacturing and digital engineering.Finally,this paper directs the future directions of DMF.
基金funded by the National Natural Science Foundation of China(42361058)supported by the Science and Technology Program of Gansu Province(22YF7FA074)。
文摘Snow cover in mountainous areas is characterized by high reflectivity,strong spatial heterogeneity,rapid changes,and susceptibility to cloud interference.However,due to the limitations of a single sensor,it is challenging to obtain high-resolution satellite remote sensing data for monitoring the dynamic changes of snow cover within a day.This study focuses on two typical data fusion methods for polar-orbiting satellites(Sentinel-3 SLSTR)and geostationary satellites(Himawari-9 AHI),and explores the snow cover detection accuracy of a multitemporal cloud-gap snow cover identification model(Loose data fusion)and the ESTARFM(Spatiotemporal data fusion).Taking the Qilian Mountains as the research area,the accuracy of two data fusion results was verified using the snow cover extracted from Landsat-8 SR products.The results showed that both data fusion models could effectively capture the spatiotemporal variations of snow cover,but the ESTARFM demonstrated superior performance.It not only obtained fusion images at any target time,but also extracted snow cover that was closer to the spatial distribution of real satellite images.Therefore,the ESTARFM was utilized to fuse images for hourly reconstruction of the snow cover on February 14–15,2023.It was found that the maximum snow cover area of this snowfall reached 83.84%of the Qilian Mountains area,and the melting rate of the snow was extremely rapid,with a change of up to 4.30%per hour of the study area.This study offers reliable high spatiotemporal resolution satellite remote sensing data for monitoring snow cover changes in mountainous areas,contributing to more accurate and timely assessments.
基金funded by the Basic Scientific Fund for National Public Research Institutes of China(No.2022 S01)the National Natural Science Foundation of China(Nos.42176191,42049902,and U22A2012)+5 种基金the Shandong Provincial Natural Science Foundation,China(No.ZR2022YQ40)the National Key R&D Program of China(No.2021YFF0501202)the Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)(No.SML2023 SP232)the Fundamental Research Funds for the Central Universities,Sun Yat-sen University(No.241gqb006)Data acquisition and sample collections were supported by the National Natural Science Foundation of China Open Research Cruise(Cruise No.NORC2021-02+NORC2021301)funded by the Shiptime Sharing Project of the National Natural Science Foundation of China。
文摘Accurate acquisition and prediction of acoustic parameters of seabed sediments are crucial in marine sound propagation research.While the relationship between sound velocity and physical properties of sediment has been extensively studied,there is still no consensus on the correlation between acoustic attenuation coefficient and sediment physical properties.Predicting the acoustic attenuation coefficient remains a challenging issue in sedimentary acoustic research.In this study,we propose a prediction method for the acoustic attenuation coefficient using machine learning algorithms,specifically the random forest(RF),support vector machine(SVR),and convolutional neural network(CNN)algorithms.We utilized the acoustic attenuation coefficient and sediment particle size data from 52 stations as training parameters,with the particle size parameters as the input feature matrix,and measured acoustic attenuation as the training label to validate the attenuation prediction model.Our results indicate that the error of the attenuation prediction model is small.Among the three models,the RF model exhibited the lowest prediction error,with a mean squared error of 0.8232,mean absolute error of 0.6613,and root mean squared error of 0.9073.Additionally,when we applied the models to predict the data collected at different times in the same region,we found that the models developed in this study also demonstrated a certain level of reliability in real prediction scenarios.Our approach demonstrates that constructing a sediment acoustic characteristics model based on machine learning is feasible to a certain extent and offers a novel perspective for studying sediment acoustic properties.
文摘Seismic data plays a pivotal role in fault detection,offering critical insights into subsurface structures and seismic hazards.Understanding fault detection from seismic data is essential for mitigating seismic risks and guiding land-use plans.This paper presents a comprehensive review of existing methodologies for fault detection,focusing on the application of Machine Learning(ML)and Deep Learning(DL)techniques to enhance accuracy and efficiency.Various ML and DL approaches are analyzed with respect to fault segmentation,adaptive learning,and fault detection models.These techniques,benchmarked against established seismic datasets,reveal significant improvements over classical methods in terms of accuracy and computational efficiency.Additionally,this review highlights emerging trends,including hybrid model applications and the integration of real-time data processing for seismic fault detection.By providing a detailed comparative analysis of current methodologies,this review aims to guide future research and foster advancements in the effectiveness and reliability of seismic studies.Ultimately,the study seeks to bridge the gap between theoretical investigations and practical implementations in fault detection.
文摘When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.
基金partially supported by the Singapore Ministry of National Development and the National Research Foundation,Prime Minister’s Office,Singapore,under the Land and Liveability National Innovation Challenge(L2 NIC)Research Program(Grant No.L2NICCFP2-2015-1)by the National Research Foundation(NRF)of Singapore,under the Virtual Singapore program(Grant No.NRF2019VSG-GMS-001).
文摘Accurate determination of rockhead is crucial for underground construction.Traditionally,borehole data are mainly used for this purpose.However,borehole drilling is costly,time-consuming,and sparsely distributed.Non-invasive geophysical methods,particularly those using passive seismic surface waves,have emerged as viable alternatives for geological profiling and rockhead detection.This study proposes three interpretation methods for rockhead determination using passive seismic surface wave data from Microtremor Array Measurement(MAM)and Horizontal-to-Vertical Spectral Ratio(HVSR)tests.These are:(1)the Wavelength-Normalized phase velocity(WN)method in which a nonlinear relationship between rockhead depth and wavelength is established;(2)the Statistically Determined-shear wave velocity(SD-V_(s))method in which the representative V_(s) value for rockhead is automatically determined using a statistical method;and(3)the empirical HVSR method in which the rockhead is determined by interpreting resonant frequencies using a reliably calibrated empirical equation.These methods were implemented to determine rockhead depths at 28 locations across two distinct geological formations in Singapore,and the results were evaluated using borehole data.The WN method can determine rockhead depths accurately and reliably with minimal absolute errors(average RMSE=3.11 m),demonstrating robust performance across both geological formations.Its advantage lies in interpreting dispersion curves alone,without the need for the inversion process.The SD-V_(s) method is practical in engineering practice owing to its simplicity.The empirical HVSR method reasonably determines rockhead depths with moderate accuracy,benefiting from a reliably calibrated empirical equation.
基金funded by the Third Xinjiang Scientific Expedition Program(2021xjkk1400)the National Natural Science Foundation of China(42071049)+2 种基金the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2019D01C022)the Xinjiang Uygur Autonomous Region Innovation Environment Construction Special Project&Science and Technology Innovation Base Construction Project(PT2107)the Tianshan Talent-Science and Technology Innovation Team(2022TSYCTD0006).
文摘Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts of climate change.Remote sensing has become a vital tool for snow monitoring,with the widely used Moderate-resolution Imaging Spectroradiometer(MODIS)snow products from the Terra and Aqua satellites.However,cloud cover often interferes with snow detection,making cloud removal techniques crucial for reliable snow product generation.This study evaluated the accuracy of four MODIS snow cover datasets generated through different cloud removal algorithms.Using real-time field camera observations from four stations in the Tianshan Mountains,China,this study assessed the performance of these datasets during three distinct snow periods:the snow accumulation period(September-November),snowmelt period(March-June),and stable snow period(December-February in the following year).The findings showed that cloud-free snow products generated using the Hidden Markov Random Field(HMRF)algorithm consistently outperformed the others,particularly under cloud cover,while cloud-free snow products using near-day synthesis and the spatiotemporal adaptive fusion method with error correction(STAR)demonstrated varying performance depending on terrain complexity and cloud conditions.This study highlighted the importance of considering terrain features,land cover types,and snow dynamics when selecting cloud removal methods,particularly in areas with rapid snow accumulation and melting.The results suggested that future research should focus on improving cloud removal algorithms through the integration of machine learning,multi-source data fusion,and advanced remote sensing technologies.By expanding validation efforts and refining cloud removal strategies,more accurate and reliable snow products can be developed,contributing to enhanced snow monitoring and better management of water resources in alpine and arid areas.
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
文摘High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).
基金Supported by Xuhui District Health Commission,No.SHXH202214.
文摘Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.
基金supported by Scientific Research Special Project of TCM Profession (200907001E)Science and Technology Special Major Project for "Significant New Drugs Formulation" (2009ZX09301-005-02)
文摘Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for anti-influenza were collected and recorded in the database, and then the correlation coefficient between herbs, core combinations of herbs and new prescriptions were analyzed by using modified mutual information, complex system entropy cluster and unsupervised hierarchical clustering, respectively. Results: Based on analysis of 126 Chinese patent medicine recipes, the frequency of each herb occurrence in these recipes, 54 frequently-used herb pairs, 34 core combinations were determined, and 4 new recipes for influenza were developed. Conclusion: Unsupervised data mining methods are able to mine the component law quickly and develop new prescriptions.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(Grant Number 2020R1A6A1A03040583).
文摘Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods.
基金Strategic Priority Research Program of the Chinese Academy of Sciences, No.XDB03030500 National Natural Science Foundation of China, No.41201095+1 种基金 No.41171080 No.413711 20
文摘In this study, we have used four methods to investigate the start of the growing season (SGS) on the Tibetan Plateau (TP) from 1982 to 2012, using Normalized Difference Vegetation Index (NDVI) data obtained from Global Inventory Modeling and Mapping Studies (GIMSS, 1982-2006) and SPOT VEGETATION (SPOT-VGT, 1999-2012). SGS values esti- mated using the four methods show similar spatial patterns along latitudinal or altitudinal gradients, but with significant variations in the SGS dates. The largest discrepancies are mainly found in the regions with the highest or the lowest vegetation coverage. Between 1982 and 1998, the SGS values derived from the four methods all display an advancing trend, however, according to the more recent SPOT VGT data (1999-2012), there is no continu- ously advancing trend of SGS on the TP. Analysis of the correlation between the SGS values derived from GIMMS and SPOT between 1999 and 2006 demonstrates consistency in the tendency with regard both to the data sources and to the four analysis methods used. Com- pared with other methods, the greatest consistency between the in situ data and the SGS values retrieved is obtained with Method 3 (Threshold of NDVI ratio). To avoid error, in a vast region with diverse vegetation types and physical environments, it is critical to know the seasonal change characteristics of the different vegetation types, particularly in areas with sparse grassland or evergreen forest.
文摘Materials development has historically been driven by human needs and desires, and this is likely to con- tinue in the foreseeable future. The global population is expected to reach ten billion by 2050, which will promote increasingly large demands for clean and high-ef ciency energy, personalized consumer prod- ucts, secure food supplies, and professional healthcare. New functional materials that are made and tai- lored for targeted properties or behaviors will be the key to tackling this challenge. Traditionally, advanced materials are found empirically or through experimental trial-and-error approaches. As big data generated by modern experimental and computational techniques is becoming more readily avail- able, data-driven or machine learning (ML) methods have opened new paradigms for the discovery and rational design of materials. In this review article, we provide a brief introduction on various ML methods and related software or tools. Main ideas and basic procedures for employing ML approaches in materials research are highlighted. We then summarize recent important applications of ML for the large-scale screening and optimal design of polymer and porous materials, catalytic materials, and energetic mate- rials. Finally, concluding remarks and an outlook are provided.
基金supported by National Key Research and Development Program of China from MOST (2016YFB0501503)
文摘The High Precision Magnetometer(HPM) on board the China Seismo-Electromagnetic Satellite(CSES) allows highly accurate measurement of the geomagnetic field; it includes FGM(Fluxgate Magnetometer) and CDSM(Coupled Dark State Magnetometer)probes. This article introduces the main processing method, algorithm, and processing procedure of the HPM data. First, the FGM and CDSM probes are calibrated according to ground sensor data. Then the FGM linear parameters can be corrected in orbit, by applying the absolute vector magnetic field correction algorithm from CDSM data. At the same time, the magnetic interference of the satellite is eliminated according to ground-satellite magnetic test results. Finally, according to the characteristics of the magnetic field direction in the low latitude region, the transformation matrix between FGM probe and star sensor is calibrated in orbit to determine the correct direction of the magnetic field. Comparing the magnetic field data of CSES and SWARM satellites in five continuous geomagnetic quiet days, the difference in measurements of the vector magnetic field is about 10 nT, which is within the uncertainty interval of geomagnetic disturbance.