期刊文献+
共找到11,481篇文章
< 1 2 250 >
每页显示 20 50 100
Advances in Machine Learning for Explainable Intrusion Detection Using Imbalance Datasets in Cybersecurity with Harris Hawks Optimization
1
作者 Amjad Rehman Tanzila Saba +2 位作者 Mona M.Jamjoom Shaha Al-Otaibi Muhammad I.Khan 《Computers, Materials & Continua》 2026年第1期1804-1818,共15页
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a... Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability. 展开更多
关键词 Intrusion detection XAI machine learning ensemble method CYBERSECURITY imbalance data
在线阅读 下载PDF
Multivariate Data Anomaly Detection Based on Graph Structure Learning
2
作者 Haoxiang Wen Zhaoyang Wang +2 位作者 Zhonglin Ye Haixing Zhao Maosong Sun 《Computer Modeling in Engineering & Sciences》 2026年第1期1174-1206,共33页
Multivariate anomaly detection plays a critical role in maintaining the stable operation of information systems.However,in existing research,multivariate data are often influenced by various factors during the data co... Multivariate anomaly detection plays a critical role in maintaining the stable operation of information systems.However,in existing research,multivariate data are often influenced by various factors during the data collection process,resulting in temporal misalignment or displacement.Due to these factors,the node representations carry substantial noise,which reduces the adaptability of the multivariate coupled network structure and subsequently degrades anomaly detection performance.Accordingly,this study proposes a novel multivariate anomaly detection model grounded in graph structure learning.Firstly,a recommendation strategy is employed to identify strongly coupled variable pairs,which are then used to construct a recommendation-driven multivariate coupling network.Secondly,a multi-channel graph encoding layer is used to dynamically optimize the structural properties of the multivariate coupling network,while a multi-head attention mechanism enhances the spatial characteristics of the multivariate data.Finally,unsupervised anomaly detection is conducted using a dynamic threshold selection algorithm.Experimental results demonstrate that effectively integrating the structural and spatial features of multivariate data significantly mitigates anomalies caused by temporal dependency misalignment. 展开更多
关键词 Multivariate data anomaly detection graph structure learning coupled network
在线阅读 下载PDF
Automated Machine Learning for Fault Diagnosis Using Multimodal Mel-Spectrogram and Vibration Data
3
作者 Zehao Li Xuting Zhang +4 位作者 Hongqi Lin Wu Qin Junyu Qi Zhuyun Chen Qiang Liu 《Computer Modeling in Engineering & Sciences》 2026年第2期471-498,共28页
To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and ex... To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis. 展开更多
关键词 Automated machine learning mechanical fault diagnosis feature engineering multimodal data
在线阅读 下载PDF
Big Data-Driven Federated Learning Model for Scalable and Privacy-Preserving Cyber Threat Detection in IoT-Enabled Healthcare Systems
4
作者 Noura Mohammed Alaskar Muzammil Hussain +3 位作者 Saif Jasim Almheiri Atta-ur-Rahman Adnan Khan Khan M.Adnan 《Computers, Materials & Continua》 2026年第4期793-816,共24页
The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threa... The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threats is both necessary and complex,yet these interconnected healthcare settings generate enormous amounts of heterogeneous data.Traditional Intrusion Detection Systems(IDS),which are generally centralized and machine learning-based,often fail to address the rapidly changing nature of cyberattacks and are challenged by ethical concerns related to patient data privacy.Moreover,traditional AI-driven IDS usually face challenges in handling large-scale,heterogeneous healthcare data while ensuring data privacy and operational efficiency.To address these issues,emerging technologies such as Big Data Analytics(BDA)and Federated Learning(FL)provide a hybrid framework for scalable,adaptive intrusion detection in IoT-driven healthcare systems.Big data techniques enable processing large-scale,highdimensional healthcare data,and FL can be used to train a model in a decentralized manner without transferring raw data,thereby maintaining privacy between institutions.This research proposes a privacy-preserving Federated Learning–based model that efficiently detects cyber threats in connected healthcare systems while ensuring distributed big data processing,privacy,and compliance with ethical regulations.To strengthen the reliability of the reported findings,the resultswere validated using cross-dataset testing and 95%confidence intervals derived frombootstrap analysis,confirming consistent performance across heterogeneous healthcare data distributions.This solution takes a significant step toward securing next-generation healthcare infrastructure by combining scalability,privacy,adaptability,and earlydetection capabilities.The proposed global model achieves a test accuracy of 99.93%±0.03(95%CI)and amiss-rate of only 0.07%±0.02,representing state-of-the-art performance in privacy-preserving intrusion detection.The proposed FL-driven IDS framework offers an efficient,privacy-preserving,and scalable solution for securing next-generation healthcare infrastructures by combining adaptability,early detection,and ethical data management. 展开更多
关键词 Intrusion detection systems cyber threat detection explainable AI big data analytics federated learning
在线阅读 下载PDF
Harnessing deep learning for the discovery of latent patterns in multi-omics medical data
5
作者 Okechukwu Paul-Chima Ugwu Fabian COgenyi +8 位作者 Chinyere Nkemjika Anyanwu Melvin Nnaemeka Ugwu Esther Ugo Alum Mariam Basajja Joseph Obiezu Chukwujekwu Ezeonwumelu Daniel Ejim Uti Ibe Michael Usman Chukwuebuka Gabriel Eze Simeon Ikechukwu Egba 《Medical Data Mining》 2026年第1期32-45,共14页
The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities... The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders. 展开更多
关键词 deep learning multi-omics integration biomedical data mining precision medicine graph neural networks autoencoders and transformers
在线阅读 下载PDF
基于随机森林与Q-learning融合的多元电力数据存储优化决策方法
6
作者 叶学顺 贾东梨 +2 位作者 周俊 唐英 贾梓豪 《科学技术与工程》 北大核心 2026年第3期1065-1074,共10页
大规模和多样的电力数据存储面临效率低和内存容量不足的瓶颈问题。数据索引和数据压缩等传统数据存储优化方法各有优劣势,如何有效应用于电力数据存储是目前研究的难点。为了解决这个问题,提出了一种融合随机森林和Q-learning的多元电... 大规模和多样的电力数据存储面临效率低和内存容量不足的瓶颈问题。数据索引和数据压缩等传统数据存储优化方法各有优劣势,如何有效应用于电力数据存储是目前研究的难点。为了解决这个问题,提出了一种融合随机森林和Q-learning的多元电力数据存储优化决策方法。该方法中的关键技术包括:首先提出了基于改进随机森林算法的存储优化策略决策模型,引入信息增益方法,综合评价数据存储时对数据库的数据访问频率、查询时间、存储速度以及数据冗余率等因素影响,做出数据直接存储、数据索引存储和数据压缩存储的存储优化方法策略决策;其次提出了基于改进Q-learning算法的数据存储算法决策模型,引入多尺度学习机制、优先经验放回机制和正负向奖励机制,决策数据索引存储时适用的索引算法以及数据压缩存储时适用的数据压缩算法。本方法有效融合了数据索引与数据压缩的技术优势,大幅提升数据存储效率并节约存储空间,为大规模多元电力数据管理提供新的解决方案。 展开更多
关键词 随机森林算法 Q-learning算法 数据存储优化方法 数据索引算法 数据压缩算法
在线阅读 下载PDF
A Survey of Federated Learning:Advances in Architecture,Synchronization,and Security Threats
7
作者 Faisal Mahmud Fahim Mahmud Rashedur M.Rahman 《Computers, Materials & Continua》 2026年第3期1-87,共87页
Federated Learning(FL)has become a leading decentralized solution that enables multiple clients to train a model in a collaborative environment without directly sharing raw data,making it suitable for privacy-sensitiv... Federated Learning(FL)has become a leading decentralized solution that enables multiple clients to train a model in a collaborative environment without directly sharing raw data,making it suitable for privacy-sensitive applications such as healthcare,finance,and smart systems.As the field continues to evolve,the research field has become more complex and scattered,covering different system designs,training methods,and privacy techniques.This survey is organized around the three core challenges:how the data is distributed,how models are synchronized,and how to defend against attacks.It provides a structured and up-to-date review of FL research from 2023 to 2025,offering a unified taxonomy that categorizes works by data distribution(Horizontal FL,Vertical FL,Federated Transfer Learning,and Personalized FL),training synchronization(synchronous and asynchronous FL),optimization strategies,and threat models(data leakage and poisoning attacks).In particular,we summarize the latest contributions in Vertical FL frameworks for secure multi-party learning,communication-efficient Horizontal FL,and domain-adaptive Federated Transfer Learning.Furthermore,we examine synchronization techniques addressing system heterogeneity,including straggler mitigation in synchronous FL and staleness management in asynchronous FL.The survey covers security threats in FL,such as gradient inversion,membership inference,and poisoning attacks,as well as their defense strategies that include privacy-preserving aggregation and anomaly detection.The paper concludes by outlining unresolved issues and highlighting challenges in handling personalized models,scalability,and real-world adoption. 展开更多
关键词 Federated learning(FL) horizontal federated learning(HFL) vertical federated learning(VFL) federated transfer learning(FTL) personalized federated learning synchronous federated learning(SFL) asynchronous federated learning(AFL) data leakage poisoning attacks privacy-preserving machine learning
在线阅读 下载PDF
FedDPL:Federated Dynamic Prototype Learning for Privacy-Preserving Malware Analysis across Heterogeneous Clients
8
作者 Danping Niu Yuan Ping +2 位作者 Chun Guo Xiaojun Wang Bin Hao 《Computers, Materials & Continua》 2026年第3期1989-2014,共26页
With the increasing complexity of malware attack techniques,traditional detection methods face significant challenges,such as privacy preservation,data heterogeneity,and lacking category information.To address these i... With the increasing complexity of malware attack techniques,traditional detection methods face significant challenges,such as privacy preservation,data heterogeneity,and lacking category information.To address these issues,we propose Federated Dynamic Prototype Learning(FedDPL)for malware classification by integrating Federated Learning with a specifically designed K-means.Under the Federated Learning framework,model training occurs locally without data sharing,effectively protecting user data privacy and preventing the leakage of sensitive information.Furthermore,to tackle the challenges of data heterogeneity and the lack of category information,FedDPL introduces a dynamic prototype learning mechanism,which adaptively adjusts the clustering prototypes in terms of position and number.Thus,the dependency on predefined category numbers in typical K-means and its variants can be significantly reduced,resulting in improved clustering performance.Theoretically,it provides a more accurate detection of malicious behavior.Experimental results confirm that FedDPL excels in handling malware classification tasks,demonstrating superior accuracy,robustness,and privacy protection. 展开更多
关键词 Malware classification data heterogeneity federated learning CLUSTERING differential privacy
在线阅读 下载PDF
A Dynamic Masking-Based Multi-Learning Framework for Sparse Classification
9
作者 Woo Hyun Park Dong Ryeol Shin 《Computers, Materials & Continua》 2026年第3期1365-1380,共16页
With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study p... With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification. 展开更多
关键词 Text classification dynamic learning contextual features data sparsity masking-based representation
在线阅读 下载PDF
Artificial intelligence-assisted non-metallic inclusion particle analysis in advanced steels using machine learning:A review
10
作者 Gonghao Lian Xiaoming Liu +3 位作者 Qiang Wang Chunguang Shen Yi Wang Wangzhong Mu 《International Journal of Minerals,Metallurgy and Materials》 2026年第2期401-416,共16页
The detection and characterization of non-metallic inclusions are essential for clean steel production.Recently,imaging analysis combined with high-dimensional data processing of metallic materials using artificial in... The detection and characterization of non-metallic inclusions are essential for clean steel production.Recently,imaging analysis combined with high-dimensional data processing of metallic materials using artificial intelligence(AI)-based machine learning(ML)has developed rapidly.This technique has achieved impressive results in the field of inclusion classification in process metallurgy.The present study surveys the ML modeling of inclusion prediction in advanced steels,including the detection,classification,and feature prediction of inclusions in different steel grades.Studies on clean steel with different features based on data and image analysis via ML are summarized.Regarding the data analysis,the inclusion prediction methodology based on ML establishes a connection between the experimental parameters and inclusion characteristics and analyzes the importance of the experimental parameters.Regarding the image analysis,the focus is placed on the classification of different types of inclusions via deep learning,in comparison with data analysis.Finally,further development of inclusion analyses using ML-based methods is recommended.This work paves the way for the application of AIbased methodologies for ultraclean-steel studies from a sustainable metallurgy perspective. 展开更多
关键词 machine learning inclusion classification image analysis data analysis clean steel
在线阅读 下载PDF
FedReg^(*):Addressing Non-Independent and Identically Distributed Challenges in Federated Learning
11
作者 SHI Xiujin ZHU Xiaolong XIAO Wentao 《Journal of Donghua University(English Edition)》 2026年第1期41-49,共9页
In non-independent and identically distributed(non-IID)data environments,model performance often degrades significantly.To address this issue,two improvement methods are proposed:FedReg and FedReg^(*).FedReg is a meth... In non-independent and identically distributed(non-IID)data environments,model performance often degrades significantly.To address this issue,two improvement methods are proposed:FedReg and FedReg^(*).FedReg is a method based on hybrid regularization aimed at enhancing federated learning in non-IID scenarios.It introduces hybrid regularization to replace traditional L2 regularization,combining the advantages of L1 and L2 regularization to enable feature selection while preventing overfitting.This method better adapts to the diverse data distributions of different clients,improving the overall model performance.FedReg^(*)combines hybrid regularization with weighted model aggregation.In addition to the benefits of hybrid regularization,FedReg^(*)applies a weighted averaging method in the model aggregation process,calculating weights based on the cosine similarity between each client gradient and the global gradient to more reasonably distribute client contributions.By considering variations in data quality and quantity among clients,FedReg^(*)highlights the importance of key clients and enhances the model’s generalization performance.These improvement methods enhance model accuracy and communication efficiency. 展开更多
关键词 federated learning non-independent and identically distributed(non-IID)data hybrid regularization cosine similarity
在线阅读 下载PDF
Research on Integrating Deep Learning-Based Vehicle Brand and Model Recognition into a Police Intelligence Analysis Platform
12
作者 Shih-Lin Lin Cheng-Wei Li 《Computers, Materials & Continua》 2026年第2期785-804,共20页
This study focuses on developing a deep learning model capable of recognizing vehicle brands and models,integrated with a law enforcement intelligence platform to overcome the limitations of existing license plate rec... This study focuses on developing a deep learning model capable of recognizing vehicle brands and models,integrated with a law enforcement intelligence platform to overcome the limitations of existing license plate recognition techniques—particularly in handling counterfeit,obscured,or absent plates.The research first entailed collecting,annotating,and classifying images of various vehiclemodels,leveraging image processing and feature extraction methodologies to train themodel on Microsoft Custom Vision.Experimental results indicate that,formost brands and models,the system achieves stable and relatively high performance in Precision,Recall,and Average Precision(AP).Furthermore,simulated tests involving illicit vehicles reveal that,even in cases of reassigned,concealed,or missing license plates,the model can rely on exterior body features to effectively identify vehicles,reducing dependence on plate-specific data.In practical law enforcement scenarios,these findings can accelerate investigations of stolen or forged plates and enhance overall accuracy.In conclusion,continued collection of vehicle images across broadermodel types,production years,and modification levels—along with refined annotation processes and parameter adjustment strategies—will further strengthen themethod’s applicability within law enforcement intelligence platforms,facilitating more precise and comprehensive vehicle recognition and control in real-world operations. 展开更多
关键词 Deep learning vehicle brand-model recognition license plate anomalies(counterfeit/obscured) law enforcement intelligence data augmentation
在线阅读 下载PDF
Dynamic UAV data fusion and deep learning for improved maize phenological-stage tracking 被引量:1
13
作者 Ziheng Feng Jiliang Zhao +8 位作者 Liunan Suo Heguang Sun Huiling Long Hao Yang Xiaoyu Song Haikuan Feng Bo Xu Guijun Yang Chunjiang Zhao 《The Crop Journal》 2025年第3期961-974,共14页
Near real-time maize phenology monitoring is crucial for field management,cropping system adjustments,and yield estimation.Most phenological monitoring methods are post-seasonal and heavily rely on high-frequency time... Near real-time maize phenology monitoring is crucial for field management,cropping system adjustments,and yield estimation.Most phenological monitoring methods are post-seasonal and heavily rely on high-frequency time-series data.These methods are not applicable on the unmanned aerial vehicle(UAV)platform due to the high cost of acquiring time-series UAV images and the shortage of UAV-based phenological monitoring methods.To address these challenges,we employed the Synthetic Minority Oversampling Technique(SMOTE)for sample augmentation,aiming to resolve the small sample modelling problem.Moreover,we utilized enhanced"separation"and"compactness"feature selection methods to identify input features from multiple data sources.In this process,we incorporated dynamic multi-source data fusion strategies,involving Vegetation index(VI),Color index(CI),and Texture features(TF).A two-stage neural network that combines Convolutional Neural Network(CNN)and Long Short-Term Memory Network(LSTM)is proposed to identify maize phenological stages(including sowing,seedling,jointing,trumpet,tasseling,maturity,and harvesting)on UAV platforms.The results indicate that the dataset generated by SMOTE closely resembles the measured dataset.Among dynamic data fusion strategies,the VI-TF combination proves to be most effective,with CI-TF and VI-CI combinations following behind.Notably,as more data sources are integrated,the model's demand for input features experiences a significant decline.In particular,the CNN-LSTM model,based on the fusion of three data sources,exhibited remarkable reliability when validating the three datasets.For Dataset 1(Beijing Xiaotangshan,2023:Data from 12 UAV Flight Missions),the model achieved an overall accuracy(OA)of 86.53%.Additionally,its precision(Pre),recall(Rec),F1 score(F1),false acceptance rate(FAR),and false rejection rate(FRR)were 0.89,0.89,0.87,0.11,and 0.11,respectively.The model also showed strong generalizability in Dataset 2(Beijing Xiaotangshan,2023:Data from 6 UAV Flight Missions)and Dataset 3(Beijing Xiaotangshan,2022:Data from 4 UAV Flight Missions),with OAs of 89.4%and 85%,respectively.Meanwhile,the model has a low demand for input featu res,requiring only 54.55%(99 of all featu res).The findings of this study not only offer novel insights into near real-time crop phenology monitoring,but also provide technical support for agricultural field management and cropping system adaptation. 展开更多
关键词 Near real-time Maize phenology Deep learning UAV Multi-source data fusion
在线阅读 下载PDF
Design of a Private Cloud Platform for Distributed Logging Big Data Based on a Unified Learning Model of Physics and Data 被引量:1
14
作者 Cheng Xi Fu Haicheng Tursyngazy Mahabbat 《Applied Geophysics》 2025年第2期499-510,560,共13页
Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of th... Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of these data has not been well stored,managed and mined.With the development of cloud computing technology,it provides a rare development opportunity for logging big data private cloud.The traditional petrophysical evaluation and interpretation model has encountered great challenges in the face of new evaluation objects.The solution research of logging big data distributed storage,processing and learning functions integrated in logging big data private cloud has not been carried out yet.To establish a distributed logging big-data private cloud platform centered on a unifi ed learning model,which achieves the distributed storage and processing of logging big data and facilitates the learning of novel knowledge patterns via the unifi ed logging learning model integrating physical simulation and data models in a large-scale functional space,thus resolving the geo-engineering evaluation problem of geothermal fi elds.Based on the research idea of“logging big data cloud platform-unifi ed logging learning model-large function space-knowledge learning&discovery-application”,the theoretical foundation of unified learning model,cloud platform architecture,data storage and learning algorithm,arithmetic power allocation and platform monitoring,platform stability,data security,etc.have been carried on analysis.The designed logging big data cloud platform realizes parallel distributed storage and processing of data and learning algorithms.The feasibility of constructing a well logging big data cloud platform based on a unifi ed learning model of physics and data is analyzed in terms of the structure,ecology,management and security of the cloud platform.The case study shows that the logging big data cloud platform has obvious technical advantages over traditional logging evaluation methods in terms of knowledge discovery method,data software and results sharing,accuracy,speed and complexity. 展开更多
关键词 Unified logging learning model logging big data private cloud machine learning
在线阅读 下载PDF
Prediction of radionuclide diffusion enabled by missing data imputation and ensemble machine learning 被引量:1
15
作者 Jun-Lei Tian Jia-Xing Feng +4 位作者 Jia-Cong Shen Lei Yao Jing-Yan Wang Tao Wu Yao-Lin Zhao 《Nuclear Science and Techniques》 2025年第10期47-61,共15页
Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light grad... Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light gradient boosting machine(LGBM)algorithm was employed to impute more than 60%of the missing data,establishing a radionuclide diffusion dataset containing 16 input features and 813 instances.The effective diffusion coefficient(D_(e))was predicted using ten ML models.The predictive accuracy of the ensemble meta-models,namely LGBM-extreme gradient boosting(XGB)and LGBM-categorical boosting(CatB),surpassed that of the other ML models,with R^(2)values of 0.94.The models were applied to predict the D_(e)values of EuEDTA^(−)and HCrO_(4)^(−)in saturated compacted bentonites at compactions ranging from 1200 to 1800 kg/m^(3),which were measured using a through-diffusion method.The generalization ability of the LGBM-XGB model surpassed that of LGB-CatB in predicting the D_(e)of HCrO_(4)^(−).Shapley additive explanations identified total porosity as the most significant influencing factor.Additionally,the partial dependence plot analysis technique yielded clearer results in the univariate correlation analysis.This study provides a regression imputation technique to refine radionuclide diffusion datasets,offering deeper insights into analyzing the diffusion mechanism of radionuclides and supporting the safety assessment of the geological disposal of high-level radioactive waste. 展开更多
关键词 Machine learning Radionuclide diffusion BENTONITE Regression imputation Missing data Diffusion experiments
在线阅读 下载PDF
Pore pressure prediction based on conventional well logs and seismic data using an advanced machine learning approach 被引量:1
16
作者 Muhsan Ehsan Umar Manzoor +6 位作者 Rujun Chen Muyyassar Hussain Kamal Abdelrahman Ahmed E.Radwan Jar Ullah Muhammad Khizer Iftikhar Farooq Arshad 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第5期2727-2740,共14页
Pore pressure is a decisive measure to assess the reservoir’s geomechanical properties,ensures safe and efficient drilling operations,and optimizes reservoir characterization and production.The conventional approache... Pore pressure is a decisive measure to assess the reservoir’s geomechanical properties,ensures safe and efficient drilling operations,and optimizes reservoir characterization and production.The conventional approaches sometimes fail to comprehend complex and persistent relationships between pore pressure and formation properties in the heterogeneous reservoirs.This study presents a novel machine learning optimized pore pressure prediction method with a limited dataset,particularly in complex formations.The method addresses the conventional approach's limitations by leveraging its capability to learn complex data relationships.It integrates the best Gradient Boosting Regressor(GBR)algorithm to model pore pressure at wells and later utilizes ContinuousWavelet Transformation(CWT)of the seismic dataset for spatial analysis,and finally employs Deep Neural Network for robust and precise pore pressure modeling for the whole volume.In the second stage,for the spatial variations of pore pressure in the thin Khadro Formation sand reservoir across the entire subsurface area,a three-dimensional pore pressure prediction is conducted using CWT.The relationship between the CWT and geomechanical properties is then established through supervised machine learning models on well locations to predict the uncertainties in pore pressure.Among all intelligent regression techniques developed using petrophysical and elastic properties for pore pressure prediction,the GBR has provided exceptional results that have been validated by evaluation metrics based on the R^(2) score i.e.,0.91 between the calibrated and predicted pore pressure.Via the deep neural network,the relationship between CWT resultant traces and predicted pore pressure is established to analyze the spatial variation. 展开更多
关键词 Pore pressure Conventional well logs Seismic data Machine learning Complex formations
在线阅读 下载PDF
Hybrid Teaching Reform and Practice in Big Data Collection and Preprocessing Courses Based on the Bosi Smart Learning Platform 被引量:1
17
作者 Yang Wang Xuemei Wang Wanyan Wang 《Journal of Contemporary Educational Research》 2025年第2期96-100,共5页
This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model... This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model follows a“three-stage”and“two-subject”framework,incorporating a structured design for teaching content and assessment methods before,during,and after class.Practical results indicate that this approach significantly enhances teaching effectiveness and improves students’learning autonomy. 展开更多
关键词 Big data Collection and Preprocessing Bosi smart learning platform Hybrid teaching Teaching reform
在线阅读 下载PDF
On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing-A Systematic Review
18
作者 Jiarui Xie Lijun Sun Yaoyao Fiona Zhao 《Engineering》 2025年第2期105-131,共27页
Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when impl... Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when implementing ML in industry.However,there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing.The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them.To establish the background for the subsequent analysis,crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition,management,analysis,and utilization.Thereafter,the concepts and frameworks established to evaluate data quality and imbalance,including data quality assessment,data readiness,information quality,data biases,fairness,and diversity,are further investigated.The root causes and types of data challenges,including human factors,complex systems,complicated relationships,lack of data quality,data heterogeneity,data imbalance,and data scarcity,are identified and summarized.Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed.This literature review focuses on two promising methods:data augmentation and active learning.The strengths,limitations,and applicability of the surveyed techniques are illustrated.The trends of data augmentation and active learning are discussed with respect to their applications,data types,and approaches.Based on this discussion,future directions for data quality improvement and data imbalance mitigation in this domain are identified. 展开更多
关键词 Machine learning Design and manufacturing data quality data augmentation Active learning
在线阅读 下载PDF
Efficient socket-based data transmission method and implementation in deep learning
19
作者 Wei Xin-Jian Li Shu-Ping +5 位作者 Yang Wu-Yang Zhang Xiang-Yang Li Hai-Shan Xu Xin Wang Nan Fu Zhanbao 《Applied Geophysics》 2025年第4期1341-1350,1499,1500,共12页
The deep learning algorithm,which has been increasingly applied in the field of petroleum geophysical prospecting,has achieved good results in improving efficiency and accuracy based on test applications.To play a gre... The deep learning algorithm,which has been increasingly applied in the field of petroleum geophysical prospecting,has achieved good results in improving efficiency and accuracy based on test applications.To play a greater role in actual production,these algorithm modules must be integrated into software systems and used more often in actual production projects.Deep learning frameworks,such as TensorFlow and PyTorch,basically take Python as the core architecture,while the application program mainly uses Java,C#,and other programming languages.During integration,the seismic data read by the Java and C#data interfaces must be transferred to the Python main program module.The data exchange methods between Java,C#,and Python include shared memory,shared directory,and so on.However,these methods have the disadvantages of low transmission efficiency and unsuitability for asynchronous networks.Considering the large volume of seismic data and the need for network support for deep learning,this paper proposes a method of transmitting seismic data based on Socket.By maximizing Socket’s cross-network and efficient longdistance transmission,this approach solves the problem of inefficient transmission of underlying data while integrating the deep learning algorithm module into a software system.Furthermore,the actual production application shows that this method effectively solves the shortage of data transmission in shared memory,shared directory,and other modes while simultaneously improving the transmission efficiency of massive seismic data across modules at the bottom of the software. 展开更多
关键词 SOCKET Deep learning Transfer data Seismic data Thread pool River prediction
在线阅读 下载PDF
Effective and efficient handling of missing data in supervised machine learning
20
作者 Peter Ayokunle Popoola Jules-Raymond Tapamo Alain Guy HonoréAssounga 《Data Science and Management》 2025年第3期361-373,共13页
The prevailing consensus in statistical literature is that multiple imputation is generally the most suitable method for addressing missing data in statistical analyses,whereas a complete case analysis is deemed appro... The prevailing consensus in statistical literature is that multiple imputation is generally the most suitable method for addressing missing data in statistical analyses,whereas a complete case analysis is deemed appropriate only when the rate of missingness is negligible or when the missingness mechanism is missing completely at random(MCAR).This study investigates the applicability of this consensus within the context of supervised machine learning,with particular emphasis on the interactions between the imputation method,missingness mechanism,and missingness rate.Furthermore,we examine the time efficiency of these“state-of-the-art”imputation methods considering the time-sensitive nature of certain machine learning applications.Utilizing ten real-world datasets,we introduced missingness at rates ranging from approximately 5%–75%under the MCAR,missing at random(MAR),and missing not at random(MNAR)mechanisms.We subsequently address missing data using five methods:complete case analysis(CCA),mean imputation,hot deck imputation,regression imputation,and multiple imputation(MI).Statistical tests are conducted on the machine learning outcomes,and the findings are presented and analyzed.Our investigation reveals that in nearly all scenarios,CCA performs comparably to MI,even with substantial levels of missingness under the MAR and MNAR conditions and with missingness in the output variable for regression problems.Under some conditions,CCA surpasses MI in terms of its performance.Thus,given the considerable computational demands associated with MI,the application of CCA is recommended within the broader context of supervised machine learning,particularly in big-data environments. 展开更多
关键词 CLASSIFICATION IMPUTATION learning Missing data Prediction
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部