期刊文献+
共找到34篇文章
< 1 2 >
每页显示 20 50 100
TBM big data preprocessing method in machine learning and its application to tunneling
1
作者 Xinyue Zhang Xiaoping Zhang +3 位作者 Quansheng Liu Weiqiang Xie Shaohui Tang Zengmao Wang 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第8期4762-4783,共22页
The big data generated by tunnel boring machines(TBMs)are widely used to reveal complex rock-machine interactions by machine learning(ML)algorithms.Data preprocessing plays a crucial role in improving ML accuracy.For ... The big data generated by tunnel boring machines(TBMs)are widely used to reveal complex rock-machine interactions by machine learning(ML)algorithms.Data preprocessing plays a crucial role in improving ML accuracy.For this,a TBM big data preprocessing method in ML was proposed in the present study.It emphasized the accurate division of TBM tunneling cycle and the optimization method of feature extraction.Based on the data collected from a TBM water conveyance tunnel in China,its effectiveness was demonstrated by application in predicting TBM performance.Firstly,the Score-Kneedle(S-K)method was proposed to divide a TBM tunneling cycle into five phases.Conducted on 500 TBM tunneling cycles,the S-K method accurately divided all five phases in 458 cycles(accuracy of 91.6%),which is superior to the conventional duration division method(accuracy of 74.2%).Additionally,the S-K method accurately divided the stable phase in 493 cycles(accuracy of 98.6%),which is superior to two state-of-the-art division methods,namely the histogram discriminant method(accuracy of 94.6%)and the cumulative sum change point detection method(accuracy of 92.8%).Secondly,features were extracted from the divided phases.Specifically,TBM tunneling resistances were extracted from the free rotating phase and free advancing phase.The resistances were subtracted from the total forces to represent the true rock-fragmentation forces.The secant slope and the mean value were extracted as features of the increasing phase and stable phase,respectively.Finally,an ML model integrating a deep neural network and genetic algorithm(GA-DNN)was established to learn the preprocessed data.The GA-DNN used 6 secant slope features extracted from the increasing phase to predict the mean field penetration index(FPI)and torque penetration index(TPI)in the stable phase,guiding TBM drivers to make better decisions in advance.The results indicate that the proposed TBM big data preprocessing method can improve prediction accuracy significantly(improving R2s of TPI and FPI on the test dataset from 0.7716 to 0.9178 and from 0.7479 to 0.8842,respectively). 展开更多
关键词 Tunnel boring machine Big data preprocessing Division of tunneling cycle Tunneling resistance Machine learning
在线阅读 下载PDF
Hybrid 1DCNN-Attention with Enhanced Data Preprocessing for Loan Approval Prediction
2
作者 Yaru Liu Huifang Feng 《Journal of Computer and Communications》 2024年第8期224-241,共18页
In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model... In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model with 1DCNN-attention network and the enhanced preprocessing techniques is proposed for loan approval prediction. Our proposed model consists of the enhanced data preprocessing and stacking of multiple hybrid modules. Initially, the enhanced data preprocessing techniques using a combination of methods such as standardization, SMOTE oversampling, feature construction, recursive feature elimination (RFE), information value (IV) and principal component analysis (PCA), which not only eliminates the effects of data jitter and non-equilibrium, but also removes redundant features while improving the representation of features. Subsequently, a hybrid module that combines a 1DCNN with an attention mechanism is proposed to extract local and global spatio-temporal features. Finally, the comprehensive experiments conducted validate that the proposed model surpasses state-of-the-art baseline models across various performance metrics, including accuracy, precision, recall, F1 score, and AUC. Our proposed model helps to automate the loan approval process and provides scientific guidance to financial institutions for loan risk control. 展开更多
关键词 Loan Approval Prediction Deep Learning One-Dimensional Convolutional Neural Network Attention Mechanism data preprocessing
在线阅读 下载PDF
Data preprocessing and preliminary results of the Moon-based Ultraviolet Telescope on the CE-3 lander 被引量:4
3
作者 Wei-Bin Wen Fang Wang +8 位作者 Chun-Lai Li Jing Wang Li Cao Jian-Jun Liu Xu Tan Yuan Xiao Qiang Fu Yan Su Wei Zuo 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2014年第12期1674-1681,共8页
The Moon-based Ultraviolet Telescope (MUVT) is one of the payloads on the Chang'e-3 (CE-3) lunar lander. Because of the advantages of having no at- mospheric disturbances and the slow rotation of the Moon, we can... The Moon-based Ultraviolet Telescope (MUVT) is one of the payloads on the Chang'e-3 (CE-3) lunar lander. Because of the advantages of having no at- mospheric disturbances and the slow rotation of the Moon, we can make long-term continuous observations of a series of important celestial objects in the near ultra- violet band (245-340 nm), and perform a sky survey of selected areas, which can- not be completed on Earth. We can find characteristic changes in celestial brightness with time by analyzing image data from the MUVT, and deduce the radiation mech- anism and physical properties of these celestial objects after comparing with a phys- ical model. In order to explain the scientific purposes of MUVT, this article analyzes the preprocessing of MUVT image data and makes a preliminary evaluation of data quality. The results demonstrate that the methods used for data collection and prepro- cessing are effective, and the Level 2A and 2B image data satisfy the requirements of follow-up scientific researches. 展开更多
关键词 Chang'e-3 mission -- the Moon-based Ultraviolet Telescope -- data preprocessing -- near ultraviolet band
在线阅读 下载PDF
Diabetes Type 2: Poincaré Data Preprocessing for Quantum Machine Learning 被引量:1
4
作者 Daniel Sierra-Sosa Juan D.Arcila-Moreno +1 位作者 Begonya Garcia-Zapirain Adel Elmaghraby 《Computers, Materials & Continua》 SCIE EI 2021年第5期1849-1861,共13页
Quantum Machine Learning(QML)techniques have been recently attracting massive interest.However reported applications usually employ synthetic or well-known datasets.One of these techniques based on using a hybrid appr... Quantum Machine Learning(QML)techniques have been recently attracting massive interest.However reported applications usually employ synthetic or well-known datasets.One of these techniques based on using a hybrid approach combining quantum and classic devices is the Variational Quantum Classifier(VQC),which development seems promising.Albeit being largely studied,VQC implementations for“real-world”datasets are still challenging on Noisy Intermediate Scale Quantum devices(NISQ).In this paper we propose a preprocessing pipeline based on Stokes parameters for data mapping.This pipeline enhances the prediction rates when applying VQC techniques,improving the feasibility of solving classification problems using NISQ devices.By including feature selection techniques and geometrical transformations,enhanced quantum state preparation is achieved.Also,a representation based on the Stokes parameters in the PoincaréSphere is possible for visualizing the data.Our results show that by using the proposed techniques we improve the classification score for the incidence of acute comorbid diseases in Type 2 Diabetes Mellitus patients.We used the implemented version of VQC available on IBM’s framework Qiskit,and obtained with two and three qubits an accuracy of 70%and 72%respectively. 展开更多
关键词 Quantum machine learning data preprocessing stokes parameters Poincarésphere
在线阅读 下载PDF
Power Data Preprocessing Method of Mountain Wind Farm Based on POT-DBSCAN 被引量:1
5
作者 Anfeng Zhu Zhao Xiao Qiancheng Zhao 《Energy Engineering》 EI 2021年第3期549-563,共15页
Due to the frequent changes of wind speed and wind direction,the accuracy of wind turbine(WT)power prediction using traditional data preprocessing method is low.This paper proposes a data preprocessing method which co... Due to the frequent changes of wind speed and wind direction,the accuracy of wind turbine(WT)power prediction using traditional data preprocessing method is low.This paper proposes a data preprocessing method which combines POT with DBSCAN(POT-DBSCAN)to improve the prediction efficiency of wind power prediction model.Firstly,according to the data of WT in the normal operation condition,the power prediction model ofWT is established based on the Particle Swarm Optimization(PSO)Arithmetic which is combined with the BP Neural Network(PSO-BP).Secondly,the wind-power data obtained from the supervisory control and data acquisition(SCADA)system is preprocessed by the POT-DBSCAN method.Then,the power prediction of the preprocessed data is carried out by PSO-BP model.Finally,the necessity of preprocessing is verified by the indexes.This case analysis shows that the prediction result of POT-DBSCAN preprocessing is better than that of the Quartile method.Therefore,the accuracy of data and prediction model can be improved by using this method. 展开更多
关键词 Wind turbine SCADA data data preprocessing method power prediction
在线阅读 下载PDF
DATA PREPROCESSING AND RE KERNEL CLUSTERING FOR LETTER
6
作者 Zhu Changming Gao Daqi 《Journal of Electronics(China)》 2014年第6期552-564,共13页
Many classifiers and methods are proposed to deal with letter recognition problem. Among them, clustering is a widely used method. But only one time for clustering is not adequately. Here, we adopt data preprocessing ... Many classifiers and methods are proposed to deal with letter recognition problem. Among them, clustering is a widely used method. But only one time for clustering is not adequately. Here, we adopt data preprocessing and a re kernel clustering method to tackle the letter recognition problem. In order to validate effectiveness and efficiency of proposed method, we introduce re kernel clustering into Kernel Nearest Neighbor classification(KNN), Radial Basis Function Neural Network(RBFNN), and Support Vector Machine(SVM). Furthermore, we compare the difference between re kernel clustering and one time kernel clustering which is denoted as kernel clustering for short. Experimental results validate that re kernel clustering forms fewer and more feasible kernels and attain higher classification accuracy. 展开更多
关键词 data preprocessing Kernel clustering Kernel Nearest Neighbor(KNN) Re kernel clustering
在线阅读 下载PDF
An improved deep learning model for soybean future price prediction with hybrid data preprocessing strategy
7
作者 Dingya CHEN Hui LIU +1 位作者 Yanfei LI Zhu DUAN 《Frontiers of Agricultural Science and Engineering》 2025年第2期208-230,共23页
The futures trading market is an important part of the financial markets and soybeans are one of the most strategically important crops in the world.How to predict soybean future price is a challenging topic being stu... The futures trading market is an important part of the financial markets and soybeans are one of the most strategically important crops in the world.How to predict soybean future price is a challenging topic being studied by many researchers.This paper proposes a novel hybrid soybean future price prediction model which includes two stages of data preprocessing and deep learning prediction.In the data preprocessing stage,futures price series are decomposed into subsequences using the ICEEMDAN(improved complete ensemble empirical mode decomposition with adaptive noise)method.The Lempel-Ziv complexity determination method was then used to identify and reconstruct high-frequency subsequences.Finally,the high frequency component is decomposed secondarily using variational mode decomposition optimized by beluga whale optimization algorithm.In the deep learning prediction stage,a deep extreme learning machine optimized by the sparrow search algorithm was used to obtain the prediction results of all subseries and reconstructs them to obtain the final soybean future price prediction results.Based on the experimental results of soybean future price markets in China,Italy,and the United States,it was found that the hybrid method proposed provides superior performance in terms of prediction accuracy and robustness. 展开更多
关键词 Deep extreme learning machine hybrid data preprocessing optimization algorithm soybean future price prediction
原文传递
Hybrid Teaching Reform and Practice in Big Data Collection and Preprocessing Courses Based on the Bosi Smart Learning Platform 被引量:1
8
作者 Yang Wang Xuemei Wang Wanyan Wang 《Journal of Contemporary Educational Research》 2025年第2期96-100,共5页
This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model... This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model follows a“three-stage”and“two-subject”framework,incorporating a structured design for teaching content and assessment methods before,during,and after class.Practical results indicate that this approach significantly enhances teaching effectiveness and improves students’learning autonomy. 展开更多
关键词 Big data Collection and preprocessing Bosi smart learning platform Hybrid teaching Teaching reform
在线阅读 下载PDF
Untargeted LC–MS Data Preprocessing in Metabolomics
9
作者 He Tian Bowen Li Guanghou Shui 《Journal of Analysis and Testing》 EI 2017年第3期187-192,共6页
Liquid chromatography–mass spectrometry(LC–MS)has enabled the detection of thousands of metabolite features from a single biological sample that produces large and complex datasets.One of the key issues in LC–MS-ba... Liquid chromatography–mass spectrometry(LC–MS)has enabled the detection of thousands of metabolite features from a single biological sample that produces large and complex datasets.One of the key issues in LC–MS-based metabolomics is comprehensive and accurate analysis of enormous amount of data.Many free data preprocessing tools,such as XCMS,MZmine,MAVEN,and MetaboAnalyst,as well as commercial software,have been developed to facilitate data processing.However,researchers are challenged by the inevitable and unconquerable yields of numerous false-positive peaks,and human errors while manually removing such false peaks.Even with continuous improvements of data processing tools,there can still be many mistakes generated during data preprocessing.In addition,many data preprocessing software exist,and every tool has its own advantages and disadvantages.Thereby,a researcher needs to judge what kind of software or tools to choose that most suit their vendor proprietary formats and goal of downstream analysis.Here,we provided a brief introduction of the general steps of raw MS data processing,and properties of automated data processing tools.Then,characteristics of mainly free data preprocessing software were summarized for researchers’consideration in conducting metabolomics study. 展开更多
关键词 Metabolomics data preprocessing LC-MS Free software/tools
原文传递
Real-Time Ship Roll Prediction via a Novel Stochastic Trainer-Based Feedforward Neural Network
10
作者 XU Dong-xing YIN Jian-chuan 《China Ocean Engineering》 2025年第4期608-620,共13页
Enhancing the accuracy of real-time ship roll prediction is crucial for maritime safety and operational efficiency.To address the challenge of accurately predicting the ship roll status with nonlinear time-varying dyn... Enhancing the accuracy of real-time ship roll prediction is crucial for maritime safety and operational efficiency.To address the challenge of accurately predicting the ship roll status with nonlinear time-varying dynamic characteristics,a real-time ship roll prediction scheme is proposed on the basis of a data preprocessing strategy and a novel stochastic trainer-based feedforward neural network.The sliding data window serves as a ship time-varying dynamic observer to enhance model prediction stability.The variational mode decomposition method extracts effective information on ship roll motion and reduces the non-stationary characteristics of the series.The energy entropy method reconstructs the mode components into high-frequency,medium-frequency,and low-frequency series to reduce model complexity.An improved black widow optimization algorithm trainer-based feedforward neural network with enhanced local optimal avoidance predicts the high-frequency component,enabling accurate tracking of abrupt signals.Additionally,the deterministic algorithm trainer-based neural network,characterized by rapid processing speed,predicts the remaining two mode components.Thus,real-time ship roll forecasting can be achieved through the reconstruction of mode component prediction results.The feasibility and effectiveness of the proposed hybrid prediction scheme for ship roll motion are demonstrated through the measured data of a full-scale ship trial.The proposed prediction scheme achieves real-time ship roll prediction with superior prediction accuracy. 展开更多
关键词 ship roll prediction data preprocessing strategy sliding data widow improved black widow optimization algorithm stochastic trainer feedforward neural network
在线阅读 下载PDF
Identification of working conditions and prediction of NO_(x) emissions in iron ore fines sintering process
11
作者 Bao-rong Wang Xiao-ming Li +3 位作者 Zhi-heng Yu Xu-hui Lin Yi-ze Ren Xiang-dong Xing 《Journal of Iron and Steel Research International》 2025年第8期2277-2285,共9页
Predicting NO_(x)in the sintering process of iron ore powder in advance was helpful to adjust the denitrification process in time.Taking NO_(x)in the sintering process of iron ore powder as the object,the boxplot,empi... Predicting NO_(x)in the sintering process of iron ore powder in advance was helpful to adjust the denitrification process in time.Taking NO_(x)in the sintering process of iron ore powder as the object,the boxplot,empirical mode decomposition algorithm,Pearson correlation coefficient,maximum information coefficient and other methods were used to preprocess the sintering data and naive Bayes classification algorithm was used to identify the sintering conditions.The regression prediction model with high accuracy and good stability was selected as the sub-model for different sintering conditions,and the sub-models were combined into an integrated prediction model.Based on actual operational data,the approach proved the superiority and effectiveness of the developed model in predicting NO_(x),yielding an accuracy of 96.17%and an absolute error of 5.56,and thereby providing valuable foresight for on-site sintering operations. 展开更多
关键词 Iron ore fines sintering Operating condition recognition NO_(x)emission data preprocessing Integrated prediction model
原文传递
Approach based on wavelet analysis for detecting and amending anomalies in dataset 被引量:1
12
作者 彭小奇 宋彦坡 +1 位作者 唐英 张建智 《Journal of Central South University of Technology》 EI 2006年第5期491-495,共5页
It is difficult to detect the anomalies whose matching relationship among some data attributes is very different from others’ in a dataset. Aiming at this problem, an approach based on wavelet analysis for detecting ... It is difficult to detect the anomalies whose matching relationship among some data attributes is very different from others’ in a dataset. Aiming at this problem, an approach based on wavelet analysis for detecting and amending anomalous samples was proposed. Taking full advantage of wavelet analysis’ properties of multi-resolution and local analysis, this approach is able to detect and amend anomalous samples effectively. To realize the rapid numeric computation of wavelet translation for a discrete sequence, a modified algorithm based on Newton-Cores formula was also proposed. The experimental result shows that the approach is feasible with good result and good practicality. 展开更多
关键词 data preprocessing wavelet analysis anomaly detecting data mining
在线阅读 下载PDF
Short-Term Mosques Load Forecast Using Machine Learning and Meteorological Data 被引量:1
13
作者 Musaed Alrashidi 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期371-387,共17页
The tendency toward achieving more sustainable and green buildings turned several passive buildings into more dynamic ones.Mosques are the type of buildings that have a unique energy usage pattern.Nevertheless,these t... The tendency toward achieving more sustainable and green buildings turned several passive buildings into more dynamic ones.Mosques are the type of buildings that have a unique energy usage pattern.Nevertheless,these types of buildings have minimal consideration in the ongoing energy efficiency applications.This is due to the unpredictability in the electrical consumption of the mosques affecting the stability of the distribution networks.Therefore,this study addresses this issue by developing a framework for a short-term electricity load forecast for a mosque load located in Riyadh,Saudi Arabia.In this study,and by harvesting the load consumption of the mosque and meteorological datasets,the performance of four forecasting algorithms is investigated,namely Artificial Neural Network and Support Vector Regression(SVR)based on three kernel functions:Radial Basis(RB),Polynomial,and Linear.In addition,this research work examines the impact of 13 different combinations of input attributes since selecting the optimal features has a major influence on yielding precise forecasting outcomes.For the mosque load,the(SVR-RB)with eleven features appeared to be the best forecasting model with the lowest forecasting errors metrics giving RMSE,nRMSE,MAE,and nMAE values of 4.207 kW,2.522%,2.938 kW,and 1.761%,respectively. 展开更多
关键词 Big data harvesting mosque load forecast data preprocessing machine learning optimal features selection
在线阅读 下载PDF
Systematic review of data-centric approaches in artificial intelligence and machine learning 被引量:4
14
作者 Prerna Singh 《Data Science and Management》 2023年第3期144-157,共14页
Artificial intelligence(AI)relies on data and algorithms.State-of-the-art(SOTA)AI smart algorithms have been developed to improve the performance of AI-oriented structures.However,model-centric approaches are limited ... Artificial intelligence(AI)relies on data and algorithms.State-of-the-art(SOTA)AI smart algorithms have been developed to improve the performance of AI-oriented structures.However,model-centric approaches are limited by the absence of high-quality data.Data-centric AI is an emerging approach for solving machine learning(ML)problems.It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline.However,data-centric AI approaches are not well documented.Researchers have conducted various experiments without a clear set of guidelines.This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems.These include big data quality assessment,data preprocessing,transfer learning,semi-supervised learning,machine learning operations(MLOps),and the effect of adding more data.In addition,it highlights recent data-centric techniques adopted by ML practitioners.We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them.Finally,we discuss the causes of technical debt in AI.Technical debt builds up when software design and implementation decisions run into“or outright collide with”business goals and timelines.This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. 展开更多
关键词 data-CENTRIC Machine learning Semi-supervised learning data preprocessing MLOps data management Technical debt
在线阅读 下载PDF
Time-varying Reliability Analysis of Long-span Continuous Rigid Frame bridge under Cantilever Construction Stage based on the Monitored Strain Data 被引量:1
15
作者 Yinghua Li Kesheng Peng +1 位作者 Lurong Cai Junyong He 《Journal of Architectural Environment & Structural Engineering Research》 2020年第1期5-16,共12页
In general,the material properties,loads,resistance of the prestressed concrete continuous rigid frame bridge in different construction stages are time-varying.So,it is essential to monitor the internal force state wh... In general,the material properties,loads,resistance of the prestressed concrete continuous rigid frame bridge in different construction stages are time-varying.So,it is essential to monitor the internal force state when the bridge is in construction.Among them,how to assess the safety is one of the challenges.As the continuous monitoring over a long-term period can increase the reliability of the assessment,so,based on a large number of monitored strain data collected from the structural health monitoring system(SHMS)during construction,a calculation method of the punctiform time-varying reliability is proposed in this paper to evaluate the stress state of this type bridge in cantilever construction stage by using the basic reliability theory.At the same time,the optimal stress distribution function in the bridge mid-span base plate is determined when the bridge is closed.This method can provide basis and direction for the internal force control of this type bridge in construction process.So,it can reduce the bridge safety and quality accidents in construction stages. 展开更多
关键词 Continuous rigid frame bridge Structural health monitoring Construction stage Punctiform time-varying reliability Strain data preprocessing
在线阅读 下载PDF
Robust Network Security:A Deep Learning Approach to Intrusion Detection in IoT
16
作者 Ammar Odeh Anas Abu Taleb 《Computers, Materials & Continua》 SCIE EI 2024年第12期4149-4169,共21页
The proliferation of Internet of Things(IoT)technology has exponentially increased the number of devices interconnected over networks,thereby escalating the potential vectors for cybersecurity threats.In response,this... The proliferation of Internet of Things(IoT)technology has exponentially increased the number of devices interconnected over networks,thereby escalating the potential vectors for cybersecurity threats.In response,this study rigorously applies and evaluates deep learning models—namely Convolutional Neural Networks(CNN),Autoencoders,and Long Short-Term Memory(LSTM)networks—to engineer an advanced Intrusion Detection System(IDS)specifically designed for IoT environments.Utilizing the comprehensive UNSW-NB15 dataset,which encompasses 49 distinct features representing varied network traffic characteristics,our methodology focused on meticulous data preprocessing including cleaning,normalization,and strategic feature selection to enhance model performance.A robust comparative analysis highlights the CNN model’s outstanding performance,achieving an accuracy of 99.89%,precision of 99.90%,recall of 99.88%,and an F1 score of 99.89%in binary classification tasks,outperforming other evaluated models significantly.These results not only confirm the superior detection capabilities of CNNs in distinguishing between benign and malicious network activities but also illustrate the model’s effectiveness in multiclass classification tasks,addressing various attack vectors prevalent in IoT setups.The empirical findings from this research demonstrate deep learning’s transformative potential in fortifying network security infrastructures against sophisticated cyber threats,providing a scalable,high-performance solution that enhances security measures across increasingly complex IoT ecosystems.This study’s outcomes are critical for security practitioners and researchers focusing on the next generation of cyber defense mechanisms,offering a data-driven foundation for future advancements in IoT security strategies. 展开更多
关键词 Intrusion detection system(IDS) Internet of Things(IoT) convolutional neural network(CNN) long short-term memory(LSTM) autoencoder network security deep learning data preprocessing feature selection cyber threats
在线阅读 下载PDF
Automated data processing and feature engineering for deep learning and big data applications:A survey 被引量:3
17
作者 Alhassan Mumuni Fuseini Mumuni 《Journal of Information and Intelligence》 2025年第2期113-153,共41页
Modern approach to artificial intelligence(Al)aims to design algorithms that learn directly from data.This approach has achieved impressive results and has contributed significantly to the progress of Al,particularly ... Modern approach to artificial intelligence(Al)aims to design algorithms that learn directly from data.This approach has achieved impressive results and has contributed significantly to the progress of Al,particularly in the sphere of supervised deep learning.It has also simplified the design of machine learning systems as the learning process is highly automated.However,not all data processing tasks in conventional deep learning pipelines have been automated.In most cases data has to be manually collected,preprocessed and further extended through data augmentation before they can be effective for training.Recently,special techniques for automating these tasks have emerged.The automation of data processing tasks is driven by the need to utilize large volumes of complex,heterogeneous data for machine learning and big data applications.Today,end-to-end automated data processing systems based on automated machine learning(AutoML)techniques are capable of taking raw data and transforming them into useful features for big data tasks by automating all intermediate processing stages.In this work,we present a thorough review of approaches for automating data processing tasks in deep learning pipelines,including auto-mated data preprocessing-e.g.,data cleaning,labeling,missing data imputation,and categorical data encoding-as well as data augmentation(including synthetic data generation using gener-ative Al methods)and feature engineering-specifically,automated feature extraction,feature construction and feature selection.In addition to automating specific data processing tasks,we discuss the use of AutoML methods and tools to simultaneously optimize all stages of the machine. 展开更多
关键词 AutoML Automated data preprocessing data processing Automated feature engineering Generative artificial intelligence Big data
原文传递
Predicting 3D Radiotherapy Dose-Volume Based on Deep Learning
18
作者 Do Nang Toan Lam Thanh Hien +2 位作者 Ha Manh Toan Nguyen Trong Vinh Pham Trung Hieu 《Intelligent Automation & Soft Computing》 2024年第2期319-335,共17页
Cancer is one of the most dangerous diseaseswith highmortality.One of the principal treatments is radiotherapy by using radiation beams to destroy cancer cells and this workflow requires a lot of experience and skill ... Cancer is one of the most dangerous diseaseswith highmortality.One of the principal treatments is radiotherapy by using radiation beams to destroy cancer cells and this workflow requires a lot of experience and skill from doctors and technicians.In our study,we focused on the 3D dose prediction problem in radiotherapy by applying the deeplearning approach to computed tomography(CT)images of cancer patients.Medical image data has more complex characteristics than normal image data,and this research aims to explore the effectiveness of data preprocessing and augmentation in the context of the 3D dose prediction problem.We proposed four strategies to clarify our hypothesis in different aspects of applying data preprocessing and augmentation.In strategies,we trained our custom convolutional neural network model which has a structure inspired by the U-net,and residual blocks were also applied to the architecture.The output of the network is added with a rectified linear unit(Re-Lu)function for each pixel to ensure there are no negative values,which are absurd with radiation doses.Our experiments were conducted on the dataset of the Open Knowledge-Based Planning Challenge which was collected from head and neck cancer patients treatedwith radiation therapy.The results of four strategies showthat our hypothesis is rational by evaluating metrics in terms of the Dose-score and the Dose-volume histogram score(DVH-score).In the best training cases,the Dose-score is 3.08 and the DVH-score is 1.78.In addition,we also conducted a comparison with the results of another study in the same context of using the loss function. 展开更多
关键词 CT image 3D dose prediction data preprocessing augmentation
在线阅读 下载PDF
Optimizing Network Security via Ensemble Learning: A Nexus with Intrusion Detection
19
作者 Anu Baluguri Vasudha Pasumarthy +2 位作者 Indranil Roy Bidyut Gupta Nick Rahimi 《Journal of Information Security》 2024年第4期545-556,共12页
Network intrusion detection systems need to be updated due to the rise in cyber threats. In order to improve detection accuracy, this research presents a strong strategy that makes use of a stacked ensemble method, wh... Network intrusion detection systems need to be updated due to the rise in cyber threats. In order to improve detection accuracy, this research presents a strong strategy that makes use of a stacked ensemble method, which combines the advantages of several machine learning models. The ensemble is made up of various base models, such as Decision Trees, K-Nearest Neighbors (KNN), Multi-Layer Perceptrons (MLP), and Naive Bayes, each of which offers a distinct perspective on the properties of the data. The research adheres to a methodical workflow that begins with thorough data preprocessing to guarantee the accuracy and applicability of the data. In order to extract useful attributes from network traffic data—which are essential for efficient model training—feature engineering is used. The ensemble approach combines these models by training a Logistic Regression model meta-learner on base model predictions. In addition to increasing prediction accuracy, this tiered approach helps get around the drawbacks that come with using individual models. High accuracy, precision, and recall are shown in the model’s evaluation of a network intrusion dataset, indicating the model’s efficacy in identifying malicious activity. Cross-validation is used to make sure the models are reliable and well-generalized to new, untested data. In addition to advancing cybersecurity, the research establishes a foundation for the implementation of flexible and scalable intrusion detection systems. This hybrid, stacked ensemble model has a lot of potential for improving cyberattack prevention, lowering the likelihood of cyberattacks, and offering a scalable solution that can be adjusted to meet new threats and technological advancements. 展开更多
关键词 Machine Learning Cyber-Security data preprocessing Model Training
在线阅读 下载PDF
Social Media Data Analysis:A Causal Inference Based Study of User Behavior Patterns
20
作者 Liangkeyi SUN 《计算社会科学》 2025年第1期37-53,共17页
This study aims to conduct an in-depth analysis of social media data using causal inference methods to explore the underlying mechanisms driving user behavior patterns.By leveraging large-scale social media datasets,t... This study aims to conduct an in-depth analysis of social media data using causal inference methods to explore the underlying mechanisms driving user behavior patterns.By leveraging large-scale social media datasets,this research develops a systematic analytical framework that integrates techniques such as propensity score matching,regression analysis,and regression discontinuity design to identify the causal effects of content characteristics,user attributes,and social network structures on user interactions,including clicks,shares,comments,and likes.The empirical findings indicate that factors such as sentiment,topical relevance,and network centrality have significant causal impacts on user behavior,with notable differences observed among various user groups.This study not only enriches the theoretical understanding of social media data analysis but also provides data-driven decision support and practical guidance for fields such as digital marketing,public opinion management,and digital governance. 展开更多
关键词 Social Media data Causal Inference User Behavior Patterns Propensity Score Matching DISCONTINUITY data preprocessing
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部