期刊文献+
共找到1,480篇文章
< 1 2 74 >
每页显示 20 50 100
TBM big data preprocessing method in machine learning and its application to tunneling
1
作者 Xinyue Zhang Xiaoping Zhang +3 位作者 Quansheng Liu Weiqiang Xie Shaohui Tang Zengmao Wang 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第8期4762-4783,共22页
The big data generated by tunnel boring machines(TBMs)are widely used to reveal complex rock-machine interactions by machine learning(ML)algorithms.Data preprocessing plays a crucial role in improving ML accuracy.For ... The big data generated by tunnel boring machines(TBMs)are widely used to reveal complex rock-machine interactions by machine learning(ML)algorithms.Data preprocessing plays a crucial role in improving ML accuracy.For this,a TBM big data preprocessing method in ML was proposed in the present study.It emphasized the accurate division of TBM tunneling cycle and the optimization method of feature extraction.Based on the data collected from a TBM water conveyance tunnel in China,its effectiveness was demonstrated by application in predicting TBM performance.Firstly,the Score-Kneedle(S-K)method was proposed to divide a TBM tunneling cycle into five phases.Conducted on 500 TBM tunneling cycles,the S-K method accurately divided all five phases in 458 cycles(accuracy of 91.6%),which is superior to the conventional duration division method(accuracy of 74.2%).Additionally,the S-K method accurately divided the stable phase in 493 cycles(accuracy of 98.6%),which is superior to two state-of-the-art division methods,namely the histogram discriminant method(accuracy of 94.6%)and the cumulative sum change point detection method(accuracy of 92.8%).Secondly,features were extracted from the divided phases.Specifically,TBM tunneling resistances were extracted from the free rotating phase and free advancing phase.The resistances were subtracted from the total forces to represent the true rock-fragmentation forces.The secant slope and the mean value were extracted as features of the increasing phase and stable phase,respectively.Finally,an ML model integrating a deep neural network and genetic algorithm(GA-DNN)was established to learn the preprocessed data.The GA-DNN used 6 secant slope features extracted from the increasing phase to predict the mean field penetration index(FPI)and torque penetration index(TPI)in the stable phase,guiding TBM drivers to make better decisions in advance.The results indicate that the proposed TBM big data preprocessing method can improve prediction accuracy significantly(improving R2s of TPI and FPI on the test dataset from 0.7716 to 0.9178 and from 0.7479 to 0.8842,respectively). 展开更多
关键词 Tunnel boring machine Big data preprocessing Division of tunneling cycle Tunneling resistance Machine learning
在线阅读 下载PDF
Power Data Preprocessing Method of Mountain Wind Farm Based on POT-DBSCAN 被引量:2
2
作者 Anfeng Zhu Zhao Xiao Qiancheng Zhao 《Energy Engineering》 EI 2021年第3期549-563,共15页
Due to the frequent changes of wind speed and wind direction,the accuracy of wind turbine(WT)power prediction using traditional data preprocessing method is low.This paper proposes a data preprocessing method which co... Due to the frequent changes of wind speed and wind direction,the accuracy of wind turbine(WT)power prediction using traditional data preprocessing method is low.This paper proposes a data preprocessing method which combines POT with DBSCAN(POT-DBSCAN)to improve the prediction efficiency of wind power prediction model.Firstly,according to the data of WT in the normal operation condition,the power prediction model ofWT is established based on the Particle Swarm Optimization(PSO)Arithmetic which is combined with the BP Neural Network(PSO-BP).Secondly,the wind-power data obtained from the supervisory control and data acquisition(SCADA)system is preprocessed by the POT-DBSCAN method.Then,the power prediction of the preprocessed data is carried out by PSO-BP model.Finally,the necessity of preprocessing is verified by the indexes.This case analysis shows that the prediction result of POT-DBSCAN preprocessing is better than that of the Quartile method.Therefore,the accuracy of data and prediction model can be improved by using this method. 展开更多
关键词 Wind turbine SCADA data data preprocessing method power prediction
在线阅读 下载PDF
Data preprocessing and preliminary results of the Moon-based Ultraviolet Telescope on the CE-3 lander 被引量:4
3
作者 Wei-Bin Wen Fang Wang +8 位作者 Chun-Lai Li Jing Wang Li Cao Jian-Jun Liu Xu Tan Yuan Xiao Qiang Fu Yan Su Wei Zuo 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2014年第12期1674-1681,共8页
The Moon-based Ultraviolet Telescope (MUVT) is one of the payloads on the Chang'e-3 (CE-3) lunar lander. Because of the advantages of having no at- mospheric disturbances and the slow rotation of the Moon, we can... The Moon-based Ultraviolet Telescope (MUVT) is one of the payloads on the Chang'e-3 (CE-3) lunar lander. Because of the advantages of having no at- mospheric disturbances and the slow rotation of the Moon, we can make long-term continuous observations of a series of important celestial objects in the near ultra- violet band (245-340 nm), and perform a sky survey of selected areas, which can- not be completed on Earth. We can find characteristic changes in celestial brightness with time by analyzing image data from the MUVT, and deduce the radiation mech- anism and physical properties of these celestial objects after comparing with a phys- ical model. In order to explain the scientific purposes of MUVT, this article analyzes the preprocessing of MUVT image data and makes a preliminary evaluation of data quality. The results demonstrate that the methods used for data collection and prepro- cessing are effective, and the Level 2A and 2B image data satisfy the requirements of follow-up scientific researches. 展开更多
关键词 Chang'e-3 mission -- the Moon-based Ultraviolet Telescope -- data preprocessing -- near ultraviolet band
在线阅读 下载PDF
Diabetes Type 2: Poincaré Data Preprocessing for Quantum Machine Learning 被引量:1
4
作者 Daniel Sierra-Sosa Juan D.Arcila-Moreno +1 位作者 Begonya Garcia-Zapirain Adel Elmaghraby 《Computers, Materials & Continua》 SCIE EI 2021年第5期1849-1861,共13页
Quantum Machine Learning(QML)techniques have been recently attracting massive interest.However reported applications usually employ synthetic or well-known datasets.One of these techniques based on using a hybrid appr... Quantum Machine Learning(QML)techniques have been recently attracting massive interest.However reported applications usually employ synthetic or well-known datasets.One of these techniques based on using a hybrid approach combining quantum and classic devices is the Variational Quantum Classifier(VQC),which development seems promising.Albeit being largely studied,VQC implementations for“real-world”datasets are still challenging on Noisy Intermediate Scale Quantum devices(NISQ).In this paper we propose a preprocessing pipeline based on Stokes parameters for data mapping.This pipeline enhances the prediction rates when applying VQC techniques,improving the feasibility of solving classification problems using NISQ devices.By including feature selection techniques and geometrical transformations,enhanced quantum state preparation is achieved.Also,a representation based on the Stokes parameters in the PoincaréSphere is possible for visualizing the data.Our results show that by using the proposed techniques we improve the classification score for the incidence of acute comorbid diseases in Type 2 Diabetes Mellitus patients.We used the implemented version of VQC available on IBM’s framework Qiskit,and obtained with two and three qubits an accuracy of 70%and 72%respectively. 展开更多
关键词 Quantum machine learning data preprocessing stokes parameters Poincarésphere
在线阅读 下载PDF
DATA PREPROCESSING AND RE KERNEL CLUSTERING FOR LETTER
5
作者 Zhu Changming Gao Daqi 《Journal of Electronics(China)》 2014年第6期552-564,共13页
Many classifiers and methods are proposed to deal with letter recognition problem. Among them, clustering is a widely used method. But only one time for clustering is not adequately. Here, we adopt data preprocessing ... Many classifiers and methods are proposed to deal with letter recognition problem. Among them, clustering is a widely used method. But only one time for clustering is not adequately. Here, we adopt data preprocessing and a re kernel clustering method to tackle the letter recognition problem. In order to validate effectiveness and efficiency of proposed method, we introduce re kernel clustering into Kernel Nearest Neighbor classification(KNN), Radial Basis Function Neural Network(RBFNN), and Support Vector Machine(SVM). Furthermore, we compare the difference between re kernel clustering and one time kernel clustering which is denoted as kernel clustering for short. Experimental results validate that re kernel clustering forms fewer and more feasible kernels and attain higher classification accuracy. 展开更多
关键词 data preprocessing Kernel clustering Kernel Nearest Neighbor(KNN) Re kernel clustering
在线阅读 下载PDF
Hybrid 1DCNN-Attention with Enhanced Data Preprocessing for Loan Approval Prediction
6
作者 Yaru Liu Huifang Feng 《Journal of Computer and Communications》 2024年第8期224-241,共18页
In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model... In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model with 1DCNN-attention network and the enhanced preprocessing techniques is proposed for loan approval prediction. Our proposed model consists of the enhanced data preprocessing and stacking of multiple hybrid modules. Initially, the enhanced data preprocessing techniques using a combination of methods such as standardization, SMOTE oversampling, feature construction, recursive feature elimination (RFE), information value (IV) and principal component analysis (PCA), which not only eliminates the effects of data jitter and non-equilibrium, but also removes redundant features while improving the representation of features. Subsequently, a hybrid module that combines a 1DCNN with an attention mechanism is proposed to extract local and global spatio-temporal features. Finally, the comprehensive experiments conducted validate that the proposed model surpasses state-of-the-art baseline models across various performance metrics, including accuracy, precision, recall, F1 score, and AUC. Our proposed model helps to automate the loan approval process and provides scientific guidance to financial institutions for loan risk control. 展开更多
关键词 Loan Approval Prediction Deep Learning One-Dimensional Convolutional Neural Network Attention Mechanism data preprocessing
在线阅读 下载PDF
D-IMPACT: A Data Preprocessing Algorithm to Improve the Performance of Clustering
7
作者 Vu Anh Tran Osamu Hirose +8 位作者 Thammakorn Saethang Lan Anh T. Nguyen Xuan Tho Dang Tu Kien T. Le Duc Luu Ngo Gavrilov Sergey Mamoru Kubo Yoichi Yamada Kenji Satou 《Journal of Software Engineering and Applications》 2014年第8期639-654,共16页
In this study, we propose a data preprocessing algorithm called D-IMPACT inspired by the IMPACT clustering algorithm. D-IMPACT iteratively moves data points based on attraction and density to detect and remove noise a... In this study, we propose a data preprocessing algorithm called D-IMPACT inspired by the IMPACT clustering algorithm. D-IMPACT iteratively moves data points based on attraction and density to detect and remove noise and outliers, and separate clusters. Our experimental results on two-dimensional datasets and practical datasets show that this algorithm can produce new datasets such that the performance of the clustering algorithm is improved. 展开更多
关键词 ATTRACTION CLUSTERING data preprocessING DENSITY SHRINKING
在线阅读 下载PDF
An improved deep learning model for soybean future price prediction with hybrid data preprocessing strategy
8
作者 Dingya CHEN Hui LIU +1 位作者 Yanfei LI Zhu DUAN 《Frontiers of Agricultural Science and Engineering》 2025年第2期208-230,共23页
The futures trading market is an important part of the financial markets and soybeans are one of the most strategically important crops in the world.How to predict soybean future price is a challenging topic being stu... The futures trading market is an important part of the financial markets and soybeans are one of the most strategically important crops in the world.How to predict soybean future price is a challenging topic being studied by many researchers.This paper proposes a novel hybrid soybean future price prediction model which includes two stages of data preprocessing and deep learning prediction.In the data preprocessing stage,futures price series are decomposed into subsequences using the ICEEMDAN(improved complete ensemble empirical mode decomposition with adaptive noise)method.The Lempel-Ziv complexity determination method was then used to identify and reconstruct high-frequency subsequences.Finally,the high frequency component is decomposed secondarily using variational mode decomposition optimized by beluga whale optimization algorithm.In the deep learning prediction stage,a deep extreme learning machine optimized by the sparrow search algorithm was used to obtain the prediction results of all subseries and reconstructs them to obtain the final soybean future price prediction results.Based on the experimental results of soybean future price markets in China,Italy,and the United States,it was found that the hybrid method proposed provides superior performance in terms of prediction accuracy and robustness. 展开更多
关键词 Deep extreme learning machine hybrid data preprocessing optimization algorithm soybean future price prediction
原文传递
Hybrid Teaching Reform and Practice in Big Data Collection and Preprocessing Courses Based on the Bosi Smart Learning Platform 被引量:1
9
作者 Yang Wang Xuemei Wang Wanyan Wang 《Journal of Contemporary Educational Research》 2025年第2期96-100,共5页
This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model... This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model follows a“three-stage”and“two-subject”framework,incorporating a structured design for teaching content and assessment methods before,during,and after class.Practical results indicate that this approach significantly enhances teaching effectiveness and improves students’learning autonomy. 展开更多
关键词 Big data Collection and preprocessing Bosi smart learning platform Hybrid teaching Teaching reform
在线阅读 下载PDF
Untargeted LC–MS Data Preprocessing in Metabolomics
10
作者 He Tian Bowen Li Guanghou Shui 《Journal of Analysis and Testing》 EI 2017年第3期187-192,共6页
Liquid chromatography–mass spectrometry(LC–MS)has enabled the detection of thousands of metabolite features from a single biological sample that produces large and complex datasets.One of the key issues in LC–MS-ba... Liquid chromatography–mass spectrometry(LC–MS)has enabled the detection of thousands of metabolite features from a single biological sample that produces large and complex datasets.One of the key issues in LC–MS-based metabolomics is comprehensive and accurate analysis of enormous amount of data.Many free data preprocessing tools,such as XCMS,MZmine,MAVEN,and MetaboAnalyst,as well as commercial software,have been developed to facilitate data processing.However,researchers are challenged by the inevitable and unconquerable yields of numerous false-positive peaks,and human errors while manually removing such false peaks.Even with continuous improvements of data processing tools,there can still be many mistakes generated during data preprocessing.In addition,many data preprocessing software exist,and every tool has its own advantages and disadvantages.Thereby,a researcher needs to judge what kind of software or tools to choose that most suit their vendor proprietary formats and goal of downstream analysis.Here,we provided a brief introduction of the general steps of raw MS data processing,and properties of automated data processing tools.Then,characteristics of mainly free data preprocessing software were summarized for researchers’consideration in conducting metabolomics study. 展开更多
关键词 Metabolomics data preprocessing LC-MS Free software/tools
原文传递
Handling missing data in large-scale TBM datasets:Methods,strategies,and applications 被引量:1
11
作者 Haohan Xiao Ruilang Cao +5 位作者 Zuyu Chen Chengyu Hong Jun Wang Min Yao Litao Fan Teng Luo 《Intelligent Geoengineering》 2025年第3期109-125,共17页
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s... Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets. 展开更多
关键词 Tunnel boring machine(TBM) Missing data imputation Machine learning(ML) Time series interpolation data preprocessing Real-time data stream
在线阅读 下载PDF
Data Matrix二维条形码解码器图像预处理研究 被引量:15
12
作者 邹沿新 杨高波 《计算机工程与应用》 CSCD 北大核心 2009年第34期183-185,188,共4页
DM码是一种常见的二维条形码,图像预处理是DM码解码器自动识别过程中的重要步骤。提出一种实用的DM码识别图像预处理方法。它没有使用传统的边缘检测和直线检测手段,因此受背景噪声、几何失真的影响较小。此外,使用了校正铁路线坐标,并... DM码是一种常见的二维条形码,图像预处理是DM码解码器自动识别过程中的重要步骤。提出一种实用的DM码识别图像预处理方法。它没有使用传统的边缘检测和直线检测手段,因此受背景噪声、几何失真的影响较小。此外,使用了校正铁路线坐标,并按区域取样生成码流,显著提高了DM码的识别速度和识别率。实验结果表明,该算法可以克服DM码识别过程中易受噪声干扰、光照不均和几何失真等影响的问题。 展开更多
关键词 二维条形码 data MATRIX 图像预处理 定位 二值化
在线阅读 下载PDF
基于DeepSeek-7B与LoRA微调的专业产线维修大模型构建与应用
13
作者 金冉 罗希望 +2 位作者 田周鹏 王佩 马光彬 《数字通信世界》 2026年第1期51-53,共3页
本文通过对目前智能维修数据大模型微调所面临的数据质量、计算资源以及模型适应性问题进行分析,提出一种改进的数据预处理技术、模型泛化能力的增强,以及微调策略的优化等具体措施。研究以典型应用案例为基础,深入挖掘微调过程的成功... 本文通过对目前智能维修数据大模型微调所面临的数据质量、计算资源以及模型适应性问题进行分析,提出一种改进的数据预处理技术、模型泛化能力的增强,以及微调策略的优化等具体措施。研究以典型应用案例为基础,深入挖掘微调过程的成功之处和失败之处。研究表明,合理的数据管控、优化的模型结构和恰当的微调策略可显著改善模型精度和适应性。由此可见,智能维修大模型微调技术在深度学习和大数据深入发展的背景下应用前景广阔,但还需要在实际应用中继续优化和改进。 展开更多
关键词 智能维修 大模型微调 数据预处理 模型优化 DeepSeek平台
在线阅读 下载PDF
可见-近红外光谱结合PLSR算法测定水中明矾含量研究
14
作者 李泽堃 冀若楠 王少伟 《电子科技》 2026年第3期16-23,共8页
明矾作为净水剂溶水无色透明,其残留可能对人体健康构成潜在威胁。文中采用可见-近红外光谱技术对纯水、池塘水等不同水体中不同浓度明矾溶液的光谱进行检测。结合偏最小二乘回归模型的方法并通过五折交叉验证以及模型训练学习建立了光... 明矾作为净水剂溶水无色透明,其残留可能对人体健康构成潜在威胁。文中采用可见-近红外光谱技术对纯水、池塘水等不同水体中不同浓度明矾溶液的光谱进行检测。结合偏最小二乘回归模型的方法并通过五折交叉验证以及模型训练学习建立了光谱数据与明矾含量之间的映射关系,获得了高达0.990 0的预测决定系数和低至0.001 7的预测均方根误差,实现了对水中明矾含量的准确预测。最低检测浓度达到0.1%,为光谱技术快速检测净水过程中明矾残留提供了技术支持。 展开更多
关键词 可见-近红外光谱 数据预处理 机器学习 偏最小二乘回归算法 SPXY算法 交叉验证 水中明矾含量 水质检测
在线阅读 下载PDF
Approach based on wavelet analysis for detecting and amending anomalies in dataset 被引量:1
15
作者 彭小奇 宋彦坡 +1 位作者 唐英 张建智 《Journal of Central South University of Technology》 EI 2006年第5期491-495,共5页
It is difficult to detect the anomalies whose matching relationship among some data attributes is very different from others’ in a dataset. Aiming at this problem, an approach based on wavelet analysis for detecting ... It is difficult to detect the anomalies whose matching relationship among some data attributes is very different from others’ in a dataset. Aiming at this problem, an approach based on wavelet analysis for detecting and amending anomalous samples was proposed. Taking full advantage of wavelet analysis’ properties of multi-resolution and local analysis, this approach is able to detect and amend anomalous samples effectively. To realize the rapid numeric computation of wavelet translation for a discrete sequence, a modified algorithm based on Newton-Cores formula was also proposed. The experimental result shows that the approach is feasible with good result and good practicality. 展开更多
关键词 data preprocessing wavelet analysis anomaly detecting data mining
在线阅读 下载PDF
Short-Term Mosques Load Forecast Using Machine Learning and Meteorological Data 被引量:1
16
作者 Musaed Alrashidi 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期371-387,共17页
The tendency toward achieving more sustainable and green buildings turned several passive buildings into more dynamic ones.Mosques are the type of buildings that have a unique energy usage pattern.Nevertheless,these t... The tendency toward achieving more sustainable and green buildings turned several passive buildings into more dynamic ones.Mosques are the type of buildings that have a unique energy usage pattern.Nevertheless,these types of buildings have minimal consideration in the ongoing energy efficiency applications.This is due to the unpredictability in the electrical consumption of the mosques affecting the stability of the distribution networks.Therefore,this study addresses this issue by developing a framework for a short-term electricity load forecast for a mosque load located in Riyadh,Saudi Arabia.In this study,and by harvesting the load consumption of the mosque and meteorological datasets,the performance of four forecasting algorithms is investigated,namely Artificial Neural Network and Support Vector Regression(SVR)based on three kernel functions:Radial Basis(RB),Polynomial,and Linear.In addition,this research work examines the impact of 13 different combinations of input attributes since selecting the optimal features has a major influence on yielding precise forecasting outcomes.For the mosque load,the(SVR-RB)with eleven features appeared to be the best forecasting model with the lowest forecasting errors metrics giving RMSE,nRMSE,MAE,and nMAE values of 4.207 kW,2.522%,2.938 kW,and 1.761%,respectively. 展开更多
关键词 Big data harvesting mosque load forecast data preprocessing machine learning optimal features selection
在线阅读 下载PDF
刀路轨迹中微线段区域分段光顺算法研究
17
作者 黄文桂 唐清春 +2 位作者 黄玉坤 刘新宇 杨鸿昆 《煤矿机械》 2026年第1期54-58,共5页
为了解决线性刀具运动轨迹导致的机床加工速度波动和加工质量差等问题,提出了一种新的区域分段光顺算法。首先,根据反曲点、曲率极值点和弓高特征点对离散数据点进行预处理;其次,对预处理的数据进行区域分段光顺算法的判断,选择合适的... 为了解决线性刀具运动轨迹导致的机床加工速度波动和加工质量差等问题,提出了一种新的区域分段光顺算法。首先,根据反曲点、曲率极值点和弓高特征点对离散数据点进行预处理;其次,对预处理的数据进行区域分段光顺算法的判断,选择合适的光顺算法;最后,以蝴蝶形试件为例,对该算法与传统单一光顺算法进行MATLAB仿真分析和实际加工验证。仿真结果表明,该算法通过对数据点的预处理减少96.30%的微小线段,通过选择合适的光顺算法减少了43.27%的控制点个数和48.71%的迭代次数。实际加工验证了该算法的正确性和可行性。 展开更多
关键词 离散数据点 数据预处理 蝴蝶形试件 刀路轨迹
原文传递
融合PCA技术的RF模型及LSTM模型在水质预测中的应用
18
作者 张中治 李军 《地下水》 2026年第1期153-156,共4页
卧龙湖作为辽宁省内最大的平原淡水湖,流域面积1644.6平方公里,属于浅水型湿地湖泊。2012年被列为国家湿地湖泊保护利用试点湖泊,因此对其进行生态保护对辽宁省水生态环境而言意义重大。对原始水质数据做数据标准化、归一化处理,使用PC... 卧龙湖作为辽宁省内最大的平原淡水湖,流域面积1644.6平方公里,属于浅水型湿地湖泊。2012年被列为国家湿地湖泊保护利用试点湖泊,因此对其进行生态保护对辽宁省水生态环境而言意义重大。对原始水质数据做数据标准化、归一化处理,使用PCA主成分分析模型筛选出影响水质主要因素,将它们作为RandomForest模型的输入,采用单因子指数法评价水质,通过创建多个决策树对水质进行预测。数据预处理后作为LSTM模型的输入,经训练后确定模型参数,再将输出反归一化,得到最终预测结果。实验表明,该方案能较好的对卧龙湖水质进行准确预测,随机森林模型准确率为85.7%,LSTM模型的均方根误差(RMSE),平均绝对误差(MAE)较小,趋近于0;拟合度(R2)趋近于1。 展开更多
关键词 水质预测 PCA技术 长短时记忆神经网络 随机森林模型 数据预处理
在线阅读 下载PDF
基于极限随机树的致密砂岩成岩相测井预测
19
作者 陈思源 吴丰 +5 位作者 刘瑜 王锦西 李唐律 李佳鑫 龙谕靖 王澳 《测井技术》 2026年第1期135-152,共18页
为解决JQ气田至TF气区沙一段致密砂岩成岩相测井响应差异微弱、传统方法难以精细识别的问题,需要构建精准的成岩相预测模型,为致密砂岩储层精细评价与油气勘探提供技术支撑。采用岩心铸体薄片鉴定结合视压实率、视胶结率、视溶蚀率定量... 为解决JQ气田至TF气区沙一段致密砂岩成岩相测井响应差异微弱、传统方法难以精细识别的问题,需要构建精准的成岩相预测模型,为致密砂岩储层精细评价与油气勘探提供技术支撑。采用岩心铸体薄片鉴定结合视压实率、视胶结率、视溶蚀率定量分析划分成岩相;系统分析不同成岩相测井响应特征,通过孤立森林检测异常数据、标准化处理及合成少数类过采样技术优化数据质量;构建极限随机树(Extremely Randomized Trees,Extra Trees)预测模型并进行参数调优;通过对比多种机器学习模型性能,结合实际井例验证模型有效性。研究结果表明:①研究区成岩相划分为长石溶蚀相、浊沸石胶结相、钙质胶结相这3类,它们的常规测井响应虽存在重叠但可通过自然伽马、声波时差、电阻率等参数的细微差异区分。②数据处理后获得918个正常样本,极限随机树模型训练集准确率86.02%、测试集准确率87.50%;实际A井239个深度点预测准确率达89.12%。③该模型在准确率、精确率、召回率及F1分数上均优于随机森林、支持向量机等6类常用模型,更擅长捕捉成岩相与测井参数的非线性关系。④未正确识别样本主要集中于成岩相转换频繁区段,受成岩作用叠加及测井纵向分辨率限制。结论认为,极限随机树模型结合多步骤数据预处理技术,有效弥补了传统测井解释方法的不足,为成岩作用叠加条件下的致密砂岩成岩相智能识别提供了可行技术途径,未来需扩充多井多区块数据并引入高分辨率测井信息,进一步提升模型泛化能力与地质解释深度。 展开更多
关键词 储层评价 机器学习 致密砂岩 成岩相识别 极限随机树 测井响应 数据预处理
在线阅读 下载PDF
Systematic review of data-centric approaches in artificial intelligence and machine learning 被引量:5
20
作者 Prerna Singh 《Data Science and Management》 2023年第3期144-157,共14页
Artificial intelligence(AI)relies on data and algorithms.State-of-the-art(SOTA)AI smart algorithms have been developed to improve the performance of AI-oriented structures.However,model-centric approaches are limited ... Artificial intelligence(AI)relies on data and algorithms.State-of-the-art(SOTA)AI smart algorithms have been developed to improve the performance of AI-oriented structures.However,model-centric approaches are limited by the absence of high-quality data.Data-centric AI is an emerging approach for solving machine learning(ML)problems.It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline.However,data-centric AI approaches are not well documented.Researchers have conducted various experiments without a clear set of guidelines.This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems.These include big data quality assessment,data preprocessing,transfer learning,semi-supervised learning,machine learning operations(MLOps),and the effect of adding more data.In addition,it highlights recent data-centric techniques adopted by ML practitioners.We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them.Finally,we discuss the causes of technical debt in AI.Technical debt builds up when software design and implementation decisions run into“or outright collide with”business goals and timelines.This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. 展开更多
关键词 data-CENTRIC Machine learning Semi-supervised learning data preprocessing MLOps data management Technical debt
在线阅读 下载PDF
上一页 1 2 74 下一页 到第
使用帮助 返回顶部