期刊文献+
共找到189篇文章
< 1 2 10 >
每页显示 20 50 100
Individual Software Expertise Formalization and Assessment from Project Management Tool Databases
1
作者 Traian-Radu Plosca Alexandru-Mihai Pescaru +1 位作者 Bianca-Valeria Rus Daniel-Ioan Curiac 《Computers, Materials & Continua》 2026年第1期389-411,共23页
Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods... Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results. 展开更多
关键词 Expertise formalization transformer-based models natural language processing augmented data project management tool skill classification
在线阅读 下载PDF
Augmented Industrial Data-Driven Modeling Under the Curse of Dimensionality 被引量:5
2
作者 Xiaoyu Jiang Xiangyin Kong Zhiqiang Ge 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第6期1445-1461,共17页
The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased si... The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased significantly,making data driven models more challenging to develop.To address this prob lem,data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensiona industrial data.This paper systematically explores and discusses the necessity,feasibility,and effectiveness of augmented indus trial data-driven modeling in the context of the curse of dimen sionality and virtual big data.Then,the process of data augmen tation modeling is analyzed,and the concept of data boosting augmentation is proposed.The data boosting augmentation involves designing the reliability weight and actual-virtual weigh functions,and developing a double weighted partial least squares model to optimize the three stages of data generation,data fusion and modeling.This approach significantly improves the inter pretability,effectiveness,and practicality of data augmentation in the industrial modeling.Finally,the proposed method is verified using practical examples of fault diagnosis systems and virtua measurement systems in the industry.The results demonstrate the effectiveness of the proposed approach in improving the accu racy and robustness of data-driven models,making them more suitable for real-world industrial applications. 展开更多
关键词 Index Terms—Curse of dimensionality data augmentation data-driven modeling industrial processes machine learning
在线阅读 下载PDF
Experiments on image data augmentation techniques for geological rock type classification with convolutional neural networks 被引量:2
3
作者 Afshin Tatar Manouchehr Haghighi Abbas Zeinijahromi 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第1期106-125,共20页
The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and hist... The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and historical context,DL offers a powerful complement by enhancing the speed,objectivity,and precision of the classification process.This research explores the significance of image data augmentation techniques in optimizing the performance of convolutional neural networks(CNNs)for geological image analysis,particularly in the classification of igneous,metamorphic,and sedimentary rock types from rock thin section(RTS)images.This study primarily focuses on classic image augmentation techniques and evaluates their impact on model accuracy and precision.Results demonstrate that augmentation techniques like Equalize significantly enhance the model's classification capabilities,achieving an F1-Score of 0.9869 for igneous rocks,0.9884 for metamorphic rocks,and 0.9929 for sedimentary rocks,representing improvements compared to the baseline original results.Moreover,the weighted average F1-Score across all classes and techniques is 0.9886,indicating an enhancement.Conversely,methods like Distort lead to decreased accuracy and F1-Score,with an F1-Score of 0.949 for igneous rocks,0.954 for metamorphic rocks,and 0.9416 for sedimentary rocks,exacerbating the performance compared to the baseline.The study underscores the practicality of image data augmentation in geological image classification and advocates for the adoption of DL methods in this domain for automation and improved results.The findings of this study can benefit various fields,including remote sensing,mineral exploration,and environmental monitoring,by enhancing the accuracy of geological image analysis both for scientific research and industrial applications. 展开更多
关键词 Deep learning(DL) Image analysis Image data augmentation Convolutional neural networks(CNNs) Geological image analysis Rock classification Rock thin section(RTS)images
在线阅读 下载PDF
Multi-sensor missile-borne LiDAR point cloud data augmentation based on Monte Carlo distortion simulation 被引量:1
4
作者 Luda Zhao Yihua Hu +4 位作者 Fei Han Zhenglei Dou Shanshan Li Yan Zhang Qilong Wu 《CAAI Transactions on Intelligence Technology》 2025年第1期300-316,共17页
Large-scale point cloud datasets form the basis for training various deep learning networks and achieving high-quality network processing tasks.Due to the diversity and robustness constraints of the data,data augmenta... Large-scale point cloud datasets form the basis for training various deep learning networks and achieving high-quality network processing tasks.Due to the diversity and robustness constraints of the data,data augmentation(DA)methods are utilised to expand dataset diversity and scale.However,due to the complex and distinct characteristics of LiDAR point cloud data from different platforms(such as missile-borne and vehicular LiDAR data),directly applying traditional 2D visual domain DA methods to 3D data can lead to networks trained using this approach not robustly achieving the corresponding tasks.To address this issue,the present study explores DA for missile-borne LiDAR point cloud using a Monte Carlo(MC)simulation method that closely resembles practical application.Firstly,the model of multi-sensor imaging system is established,taking into account the joint errors arising from the platform itself and the relative motion during the imaging process.A distortion simulation method based on MC simulation for augmenting missile-borne LiDAR point cloud data is proposed,underpinned by an analysis of combined errors between different modal sensors,achieving high-quality augmentation of point cloud data.The effectiveness of the proposed method in addressing imaging system errors and distortion simulation is validated using the imaging scene dataset constructed in this paper.Comparative experiments between the proposed point cloud DA algorithm and the current state-of-the-art algorithms in point cloud detection and single object tracking tasks demonstrate that the proposed method can improve the network performance obtained from unaugmented datasets by over 17.3%and 17.9%,surpassing SOTA performance of current point cloud DA algorithms. 展开更多
关键词 data augmentation LIDAR missile-borne imaging Monte Carlo simulation point cloud
在线阅读 下载PDF
Pre-trained SAM as data augmentation for image segmentation 被引量:1
5
作者 Junjun Wu Yunbo Rao +1 位作者 Shaoning Zeng Bob Zhang 《CAAI Transactions on Intelligence Technology》 2025年第1期268-282,共15页
Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in ord... Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation. 展开更多
关键词 data augmentation image segmentation large model segment anything model
在线阅读 下载PDF
Data augmentation method for light guide plate based on improved CycleGAN
6
作者 GONG Yefei YAN Chao +2 位作者 XIAO Ming LU Mingli GAO Hua 《Optoelectronics Letters》 2025年第9期555-561,共7页
An improved cycle-consistent generative adversarial network(CycleGAN) method for defect data augmentation based on feature fusion and self attention residual module is proposed to address the insufficiency of defect s... An improved cycle-consistent generative adversarial network(CycleGAN) method for defect data augmentation based on feature fusion and self attention residual module is proposed to address the insufficiency of defect sample data for light guide plate(LGP) in production,as well as the problem of minor defects.Two optimizations are made to the generator of CycleGAN:fusion of low resolution features obtained from partial up-sampling and down-sampling with high-resolution features,combination of self attention mechanism with residual network structure to replace the original residual module.Qualitative and quantitative experiments were conducted to compare different data augmentation methods,and the results show that the defect images of the LGP generated by the improved network were more realistic,and the accuracy of the you only look once version 5(YOLOv5) detection network for the LGP was improved by 5.6%,proving the effectiveness and accuracy of the proposed method. 展开更多
关键词 feature fusion self attention mec data augmentation light guide plate lgp cyclegan fusion low resolution features defect data augmentation self attention residual module minor defectstwo
原文传递
Syn-Aug:An Effective and General Synchronous Data Augmentation Framework for 3D Object Detection
7
作者 Huaijin Liu Jixiang Du +2 位作者 Yong Zhang Hongbo Zhang Jiandian Zeng 《CAAI Transactions on Intelligence Technology》 2025年第3期912-928,共17页
Data augmentation plays an important role in boosting the performance of 3D models,while very few studies handle the 3D point cloud data with this technique.Global augmentation and cut-paste are commonly used augmenta... Data augmentation plays an important role in boosting the performance of 3D models,while very few studies handle the 3D point cloud data with this technique.Global augmentation and cut-paste are commonly used augmentation techniques for point clouds,where global augmentation is applied to the entire point cloud of the scene,and cut-paste samples objects from other frames into the current frame.Both types of data augmentation can improve performance,but the cut-paste technique cannot effectively deal with the occlusion relationship between the foreground object and the background scene and the rationality of object sampling,which may be counterproductive and may hurt the overall performance.In addition,LiDAR is susceptible to signal loss,external occlusion,extreme weather and other factors,which can easily cause object shape changes,while global augmentation and cut-paste cannot effectively enhance the robustness of the model.To this end,we propose Syn-Aug,a synchronous data augmentation framework for LiDAR-based 3D object detection.Specifically,we first propose a novel rendering-based object augmentation technique(Ren-Aug)to enrich training data while enhancing scene realism.Second,we propose a local augmentation technique(Local-Aug)to generate local noise by rotating and scaling objects in the scene while avoiding collisions,which can improve generalisation performance.Finally,we make full use of the structural information of 3D labels to make the model more robust by randomly changing the geometry of objects in the training frames.We verify the proposed framework with four different types of 3D object detectors.Experimental results show that our proposed Syn-Aug significantly improves the performance of various 3D object detectors in the KITTI and nuScenes datasets,proving the effectiveness and generality of Syn-Aug.On KITTI,four different types of baseline models using Syn-Aug improved mAP by 0.89%,1.35%,1.61%and 1.14%respectively.On nuScenes,four different types of baseline models using Syn-Aug improved mAP by 14.93%,10.42%,8.47%and 6.81%respectively.The code is available at https://github.com/liuhuaijjin/Syn-Aug. 展开更多
关键词 3D object detection data augmentation DIVERSITY GENERALIZATION point cloud ROBUSTNESS
在线阅读 下载PDF
GAN-based data augmentation of time series for fault diagnosis in railway track
8
作者 Héctor A.Fernández-Bobadilla Yahya Bouchikhi Ullrich Martin 《Railway Engineering Science》 2025年第4期642-683,共42页
Supervised learning classification has arisen as a powerful tool to perform data-driven fault diagnosis in dynamical systems,achieving astonishing results.This approach assumes the availability of extensive,diverse an... Supervised learning classification has arisen as a powerful tool to perform data-driven fault diagnosis in dynamical systems,achieving astonishing results.This approach assumes the availability of extensive,diverse and labeled data corpora for train-ing.However,in some applications it may be difficult or not feasible to obtain a large and balanced dataset including enough representative instances of the fault behaviors of interest.This fact leads to the issues of data scarcity and class imbalance,greatly affecting the performance of supervised learning classifiers.Datasets from railway systems are usually both,scarce and imbalanced,turning supervised learning-based fault diagnosis into a highly challenging task.This article addresses time-series data augmentation for fault diagnosis purposes and presents two application cases in the context of railway track.The case studies employ generative adversarial networks(GAN)schemes to produce realistic synthetic samples of geometrical and structural track defects.The goal is to generate samples that enhance fault diagnosis performance;therefore,major attention was paid not only in the generation process,but also in the synthesis quality assessment,to guarantee the suitability of the samples for training of supervised learning classification models.In the first application,a convolutional classifier achieved a test accuracy of 87.5%for the train on synthetic,test on real(TSTR)scenario,while,in the second application,a fully-connected classifier achieved 96.18%in test accuracy for TSTR.The results indicate that the proposed augmentation approach produces samples having equivalent statistical characteristics and leading to a similar classification behavior as real data. 展开更多
关键词 data augmentation Time series Generative adversarial networks Fault diagnosis Predictive maintenance Railway systems
在线阅读 下载PDF
Optimization of convolutional neural networks for predicting water pollutants using spectral data in the middle and lower reaches of the Yangtze River Basin,China
9
作者 ZHANG Guohao LI Song +3 位作者 WANG Cailing WANG Hongwei YU Tao DAI Xiaoxu 《Journal of Mountain Science》 2025年第8期2851-2869,共19页
Developing an accurate and efficient comprehensive water quality prediction model and its assessment method is crucial for the prevention and control of water pollution.Deep learning(DL),as one of the most promising t... Developing an accurate and efficient comprehensive water quality prediction model and its assessment method is crucial for the prevention and control of water pollution.Deep learning(DL),as one of the most promising technologies today,plays a crucial role in the effective assessment of water body health,which is essential for water resource management.This study models using both the original dataset and a dataset augmented with Generative Adversarial Networks(GAN).It integrates optimization algorithms(OA)with Convolutional Neural Networks(CNN)to propose a comprehensive water quality model evaluation method aiming at identifying the optimal models for different pollutants.Specifically,after preprocessing the spectral dataset,data augmentation was conducted to obtain two datasets.Then,six new models were developed on these datasets using particle swarm optimization(PSO),genetic algorithm(GA),and simulated annealing(SA)combined with CNN to simulate and forecast the concentrations of three water pollutants:Chemical Oxygen Demand(COD),Total Nitrogen(TN),and Total Phosphorus(TP).Finally,seven model evaluation methods,including uncertainty analysis,were used to evaluate the constructed models and select the optimal models for the three pollutants.The evaluation results indicate that the GPSCNN model performed best in predicting COD and TP concentrations,while the GGACNN model excelled in TN concentration prediction.Compared to existing technologies,the proposed models and evaluation methods provide a more comprehensive and rapid approach to water body prediction and assessment,offering new insights and methods for water pollution prevention and control. 展开更多
关键词 Water pollutants Convolutional neural networks data augmentation Optimization algorithms Model evaluation methods Deep Learning
原文传递
Bird Species Classification Using Image Background Removal for Data Augmentation
10
作者 Yu-Xiang Zhao Yi Lee 《Computers, Materials & Continua》 2025年第7期791-810,共20页
Bird species classification is not only a challenging topic in artificial intelligence but also a domain closely related to environmental protection and ecological research.Additionally,performing edge computing on lo... Bird species classification is not only a challenging topic in artificial intelligence but also a domain closely related to environmental protection and ecological research.Additionally,performing edge computing on low-level devices using small neural networks can be an important research direction.In this paper,we use the EfficientNetV2B0 model for bird species classification,applying transfer learning on a dataset of 525 bird species.We also employ the BiRefNet model to remove backgrounds from images in the training set.The generated background-removed images are mixed with the original training set as a form of data augmentation.We aim for these background-removed images to help the model focus on key features,and by combining data augmentation with transfer learning,we trained a highly accurate and efficient bird species classification model.The training process is divided into a transfer learning stage and a fine-tuning stage.In the transfer learning stage,only the newly added custom layers are trained;while in the fine-tuning stage,all pre-trained layers except for the batch normalization layers are fine-tuned.According to the experimental results,the proposed model not only has an advantage in size compared to other models but also outperforms them in various metrics.The training results show that the proposed model achieved an accuracy of 99.54%and a precision of 99.62%,demonstrating that it achieves both lightweight design and high accuracy.To confirm the credibility of the results,we use heatmaps to interpret the model.The heatmaps show that our model can clearly highlight the image feature area.In addition,we also perform the 10-fold cross-validation on the model to verify its credibility.Finally,this paper proposes a model with low training cost and high accuracy,making it suitable for deployment on edge computing devices to provide lighter and more convenient services. 展开更多
关键词 Bird species classification edge computing EfficientNet BiRefNet data augmentation
在线阅读 下载PDF
Advancing predictive accuracy of shallow landslide using strategic data augmentation
11
作者 Hongzhi Qiu Xiaoqing Chen +4 位作者 Peng Feng Renchao Wang Wang Hu Liping Zhang Alessandro Pasuto 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第7期4273-4287,共15页
Rainfall-induced shallow landslides pose one of significant geological hazards,necessitating precise monitoring and prediction for effective disaster mitigation.Most studies on landslide prediction have focused on opt... Rainfall-induced shallow landslides pose one of significant geological hazards,necessitating precise monitoring and prediction for effective disaster mitigation.Most studies on landslide prediction have focused on optimizing machine learning(ML)algorithms,very limited attention has been paid to enhancing data quality for improved predictive performance.This study employs strategic data augmentation(DA)techniques to enhance the accuracy of shallow landslide prediction.Using five DA methods including singular spectrum analysis(SSA),moving averages(MA),wavelet denoising(WD),variational mode decomposition(VMD),and linear interpolation(LI),we utilize strategies such as smoothing,denoising,trend decomposition,and synthetic data generation to improve the training dataset.Four machine learning algorithms,i.e.artificial neural network(ANN),recurrent neural network(RNN),one-dimensional convolutional neural network(CNN1D),and long short-term memory(LSTM),are used to forecast landslide displacement.The case study of a landslide in southwest China shows the effectiveness of our approach in predicting landslide displacements,despite the inherent limitations of the monitoring dataset.VMD proves the most effective for smoothing and denoising,improving R^(2),RMSE,and MAPE by 172.16%,71.82%,and 98.9%,respectively.SSA addresses missing data,while LI is effective with limited data samples,improving metrics by 21.6%,52.59%,and 47.87%,respectively.This study demonstrates the potential of DA techniques to mitigate the impact of data defects on landslide prediction accuracy,with implications for similar cases. 展开更多
关键词 Shallow landslide data augmentation Machine learning Neural network Deformation prediction
在线阅读 下载PDF
Predicting Concrete Strength Using Data Augmentation Coupled with Multiple Optimizers in Feedforward Neural Networks
12
作者 Sandeerah Choudhary Qaisar Abbas +3 位作者 Tallha Akram Irshad Qureshi Mutlaq B.Aldajani Hammad Salahuddin 《Computer Modeling in Engineering & Sciences》 2025年第11期1755-1787,共33页
The increasing demand for sustainable construction practices has led to growing interest in recycled aggregate concrete(RAC)as an eco-friendly alternative to conventional concrete.However,predicting its compressive st... The increasing demand for sustainable construction practices has led to growing interest in recycled aggregate concrete(RAC)as an eco-friendly alternative to conventional concrete.However,predicting its compressive strength remains a challenge due to the variability in recycled materials and mix design parameters.This study presents a robust machine learning framework for predicting the compressive strength of recycled aggregate concrete using feedforward neural networks(FFNN),Random Forest(RF),and XGBoost.A literature-derived dataset of 502 samples was enriched via interpolation-based data augmentation and modeled using five distinct optimization techniques within MATLAB’s Neural Net Fitting module:Bayesian Regularization,Levenberg-Marquardt,and three conjugate gradient variants—Powell/Beale Restarts,Fletcher-Powell,and Polak-Ribiere.Hyperparameter tuning,dropout regularization,and early stopping were employed to enhance generalization.Comparative analysis revealed that FFNN outperformed RF and XGBoost,achieving an R2 of 0.9669.To ensure interpretability,accumulated local effects(ALE)along with partial dependence plots(PDP)were utilized.This revealed trends consistent with the pre-existent domain knowledge.This allows estimation of strength from the properties of the mix without extensive lab testing,permitting designers to track the performance and sustainability trends in concrete mix designs while promoting responsible construction and demolition waste utilization. 展开更多
关键词 Feedforward neural networks recycled aggregates compressive strength prediction optimization techniques data augmentation grid search
在线阅读 下载PDF
A solution framework for the experimental data shortage problem of lithium-ion batteries:Generative adversarial network-based data augmentation for battery state estimation
13
作者 Jinghua Sun Ankun Gu Josef Kainz 《Journal of Energy Chemistry》 2025年第4期476-497,共22页
In order to address the widespread data shortage problem in battery research,this paper proposes a generative adversarial network model that combines it with deep convolutional networks,the Wasserstein distance,and th... In order to address the widespread data shortage problem in battery research,this paper proposes a generative adversarial network model that combines it with deep convolutional networks,the Wasserstein distance,and the gradient penalty to achieve data augmentation.To lower the threshold for implementing the proposed method,transfer learning is further introduced.The W-DC-GAN-GP-TL framework is thereby formed.This framework is evaluated on 3 different publicly available datasets to judge the quality of generated data.Through visual comparisons and the examination of two visualization methods(probability density function(PDF)and principal component analysis(PCA)),it is demonstrated that the generated data is hard to distinguish from the real data.The application of generated data for training a battery state model using transfer learning is further evaluated.Specifically,Bi-GRU-based and Transformer-based methods are implemented on 2 separate datasets for estimating state of health(SOH)and state of charge(SOC),respectively.The results indicate that the proposed framework demonstrates satisfactory performance in different scenarios:for the data replacement scenario,where real data are removed and replaced with generated data,the state estimator accuracy decreases only slightly;for the data enhancement scenario,the estimator accuracy is further improved.The estimation accuracy of SOH and SOC is as low as 0.69%and 0.58%root mean square error(RMSE)after applying the proposed framework.This framework provides a reliable method for enriching battery measurement data.It is a generalized framework capable of generating a variety of time series data. 展开更多
关键词 Lithium-ion battery Generative adversarial network data augmentation State of health State of charge data shortage
在线阅读 下载PDF
On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing-A Systematic Review
14
作者 Jiarui Xie Lijun Sun Yaoyao Fiona Zhao 《Engineering》 2025年第2期105-131,共27页
Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when impl... Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when implementing ML in industry.However,there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing.The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them.To establish the background for the subsequent analysis,crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition,management,analysis,and utilization.Thereafter,the concepts and frameworks established to evaluate data quality and imbalance,including data quality assessment,data readiness,information quality,data biases,fairness,and diversity,are further investigated.The root causes and types of data challenges,including human factors,complex systems,complicated relationships,lack of data quality,data heterogeneity,data imbalance,and data scarcity,are identified and summarized.Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed.This literature review focuses on two promising methods:data augmentation and active learning.The strengths,limitations,and applicability of the surveyed techniques are illustrated.The trends of data augmentation and active learning are discussed with respect to their applications,data types,and approaches.Based on this discussion,future directions for data quality improvement and data imbalance mitigation in this domain are identified. 展开更多
关键词 Machine learning Design and manufacturing data quality data augmentation Active learning
在线阅读 下载PDF
Prediction of abnormal TBM disc cutter wear in mixed ground condition using interpretable machine learning with data augmentation
15
作者 Kibeom Kwon Hangseok Choi +2 位作者 Jaehoon Jung Dongku Kim Young Jin Shin 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第4期2059-2071,共13页
The widespread adoption of tunnel boring machines(TBMs)has led to an increased focus on disc cutter wear,including both normal and abnormal types,for efficient and safe TBM excavation.However,abnormal wear has yet to ... The widespread adoption of tunnel boring machines(TBMs)has led to an increased focus on disc cutter wear,including both normal and abnormal types,for efficient and safe TBM excavation.However,abnormal wear has yet to be thoroughly investigated,primarily due to the complexity of considering mixed ground conditions and the imbalance in the number of instances between the two types of wear.This study developed a prediction model for abnormal TBM disc cutter wear,considering mixed ground conditions,by employing interpretable machine learning with data augmentation.An equivalent elastic modulus was used to consider the characteristics of mixed ground conditions,and wear data was obtained from 65 cutterhead intervention(CHI)reports covering both mixed ground and hard rock sections.With a balanced training dataset obtained by data augmentation,an extreme gradient boosting(XGB)model delivered acceptable results with an accuracy of 0.94,an F1-score of 0.808,and a recall of 0.8.In addition,the accuracy for each individual disc cutter exhibited low variability.When employing data augmentation,a significant improvement in recall was observed compared to when it was not used,although the difference in accuracy and F1-score was marginal.The subsequent model interpretation revealed the chamber pressure,cutter installation radius,and torque as significant contributors.Specifically,a threshold in chamber pressure was observed,which could induce abnormal wear.The study also explored how elevated values of these influential contributors correlate with abnormal wear.The proposed model offers a valuable tool for planning the replacement of abnormally worn disc cutters,enhancing the safety and efficiency of TBM operations. 展开更多
关键词 Disc cutter Abnormal wear Mixed ground Interpretable machine learning data augmentation
在线阅读 下载PDF
ONTDAS: An Optimized Noise-Based Traffic Data Augmentation System for Generalizability Improvement of Traffic Classifiers
16
作者 Rongwei Yu Jie Yin +2 位作者 Jingyi Xiang Qiyun Shao Lina Wang 《Computers, Materials & Continua》 2025年第7期365-391,共27页
With the emergence of new attack techniques,traffic classifiers usually fail to maintain the expected performance in real-world network environments.In order to have sufficient generalizability to deal with unknown ma... With the emergence of new attack techniques,traffic classifiers usually fail to maintain the expected performance in real-world network environments.In order to have sufficient generalizability to deal with unknown malicious samples,they require a large number of new samples for retraining.Considering the cost of data collection and labeling,data augmentation is an ideal solution.We propose an optimized noise-based traffic data augmentation system,ONTDAS.The system uses a gradient-based searching algorithm and an improved Bayesian optimizer to obtain optimized noise.The noise is injected into the original samples for data augmentation.Then,an improved bagging algorithm is used to integrate all the base traffic classifiers trained on noised datasets.The experiments verify ONTDAS on 6 types of base classifiers and 4 publicly available datasets respectively.The results show that ONTDAS can effectively enhance the traffic classifiers’performance and significantly improve their generalizability on unknown malicious samples.The system can also alleviate dataset imbalance.Moreover,the performance of ONTDAS is significantly superior to the existing data augmentation methods mentioned. 展开更多
关键词 Unknown malicious traffic classification data augmentation optimized noise generalizability improvement ensemble learning
在线阅读 下载PDF
Data-Enhanced Low-Cycle Fatigue Life Prediction Model Based on Nickel-Based Superalloys
17
作者 Luopeng Xu Lei Xiong +5 位作者 Rulun Zhang Jiajun Zheng Huawei Zou Zhixin Li Xiaopeng Wang Qingyuan Wang 《Acta Mechanica Solida Sinica》 2025年第4期612-623,共12页
To overcome the challenges of limited experimental data and improve the accuracy of empirical formulas,we propose a low-cycle fatigue(LCF)life prediction model for nickel-based superalloys using a data augmentation me... To overcome the challenges of limited experimental data and improve the accuracy of empirical formulas,we propose a low-cycle fatigue(LCF)life prediction model for nickel-based superalloys using a data augmentation method.This method utilizes a variational autoencoder(VAE)to generate low-cycle fatigue data and form an augmented dataset.The Pearson correlation coefficient(PCC)is employed to verify the similarity of feature distributions between the original and augmented datasets.Six machine learning models,namely random forest(RF),artificial neural network(ANN),support vector machine(SVM),gradient-boosted decision tree(GBDT),eXtreme Gradient Boosting(XGBoost),and Categorical Boosting(CatBoost),are utilized to predict the LCF life of nickel-based superalloys.Results indicate that the proposed data augmentation method based on VAE can effectively expand the dataset,and the mean absolute error(MAE),root mean square error(RMSE),and R-squared(R^(2))values achieved using the CatBoost model,with respective values of 0.0242,0.0391,and 0.9538,are superior to those of the other models.The proposed method reduces the cost and time associated with LCF experiments and accurately establishes the relationship between fatigue characteristics and LCF life of nickel-based superalloys. 展开更多
关键词 Nickel-based superalloy Low-cycle fatigue(LCF) Fatigue life prediction data augmentation method Machine learning model Variational autoencoder(VAE)
原文传递
Data Augmentation:A Multi-Perspective Survey on Data,Methods,and Applications
18
作者 Canlin Cui Junyu Yao Heng Xia 《Computers, Materials & Continua》 2025年第12期4275-4306,共32页
High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of su... High-quality data is essential for the success of data-driven learning tasks.The characteristics,precision,and completeness of the datasets critically determine the reliability,interpretability,and effectiveness of subsequent analyzes and applications,such as fault detection,predictive maintenance,and process optimization.However,for many industrial processes,obtaining sufficient high-quality data remains a significant challenge due to high costs,safety concerns,and practical constraints.To overcome these challenges,data augmentation has emerged as a rapidly growing research area,attracting considerable attention across both academia and industry.By expanding datasets,data augmentation techniques improve greater generalization and more robust performance in actual applications.This paper provides a comprehensive,multi-perspective review of data augmentation methods for industrial processes.For clarity and organization,existing studies are systematically grouped into four categories:small sample with low dimension,small sample with high dimension,large sample with low dimension,and large sample with high dimension.Within this framework,the review examines current research from both methodological and application-oriented perspectives,highlighting main methods,advantages,and limitations.By synthesizing these findings,this review offers a structured overview for scholars and practitioners,serving as a valuable reference for newcomers and experienced researchers seeking to explore and advance data augmentation techniques in industrial processes. 展开更多
关键词 data-DRIVEN data augmentation big data industrial application
在线阅读 下载PDF
Enhancing Medical Image Classification with BSDA-Mamba:Integrating Bayesian Random Semantic Data Augmentation and Residual Connections
19
作者 Honglin Wang Yaohua Xu Cheng Zhu 《Computers, Materials & Continua》 2025年第6期4999-5018,共20页
Medical image classification is crucial in disease diagnosis,treatment planning,and clinical decisionmaking.We introduced a novel medical image classification approach that integrates Bayesian Random Semantic Data Aug... Medical image classification is crucial in disease diagnosis,treatment planning,and clinical decisionmaking.We introduced a novel medical image classification approach that integrates Bayesian Random Semantic Data Augmentation(BSDA)with a Vision Mamba-based model for medical image classification(MedMamba),enhanced by residual connection blocks,we named the model BSDA-Mamba.BSDA augments medical image data semantically,enhancing the model’s generalization ability and classification performance.MedMamba,a deep learning-based state space model,excels in capturing long-range dependencies in medical images.By incorporating residual connections,BSDA-Mamba further improves feature extraction capabilities.Through comprehensive experiments on eight medical image datasets,we demonstrate that BSDA-Mamba outperforms existing models in accuracy,area under the curve,and F1-score.Our results highlight BSDA-Mamba’s potential as a reliable tool for medical image analysis,particularly in handling diverse imaging modalities from X-rays to MRI.The open-sourcing of our model’s code and datasets,will facilitate the reproduction and extension of our work. 展开更多
关键词 Deep learning medical image classification data augmentation visual state space model
在线阅读 下载PDF
Deep Learning-Based Health Assessment Method for Benzene-to-Ethylene Ratio Control Systems under Incomplete Data
20
作者 Huichao Cao Honghe Du +3 位作者 Dongnian Jiang Wei Li Lei Du Jianfeng Yang 《Structural Durability & Health Monitoring》 2025年第5期1305-1325,共21页
In the production processes of modern industry,accurate assessment of the system’s health state and traceability non-optimal factors are key to ensuring“safe,stable,long-term,full load and optimal”operation of the ... In the production processes of modern industry,accurate assessment of the system’s health state and traceability non-optimal factors are key to ensuring“safe,stable,long-term,full load and optimal”operation of the production process.The benzene-to-ethylene ratio control system is a complex system based on anMPC-PID doublelayer architecture.Taking into consideration the interaction between levels,coupling between loops and conditions of incomplete operation data,this paper proposes a health assessment method for the dual-layer control system by comprehensively utilizing deep learning technology.Firstly,according to the results of the pre-assessment of the system layers and loops bymultivariate statisticalmethods,seven characteristic parameters that have a significant impact on the health state of the system are identified.Next,aiming at the problem of incomplete assessment data set due to the uneven distribution of actual system operating health state,the original unbalanced dataset is augmented using aWasserstein generative adversarial network with gradient penalty term,and a complete dataset is obtained to characterise all the health states of the system.On this basis,a new deep learning-based health assessment framework for the benzeneto-ethylene ratio control system is constructed based on traditionalmultivariate statistical assessment.This framework can overcome the shortcomings of the linear weighted fusion related to the coupling and nonlinearity of the subsystem health state at different layers,and reduce the dependence of the prior knowledge.Furthermore,by introducing a dynamic attention mechanism(AM)into the convolutional neural network(CNN),the assessment model integrating both assessment and traceability is constructed,which can achieve the health assessment and trace the non-optimal factors of the complex control systems with the double-layer architecture.Finally,the effectiveness and superiority of the proposed method have been verified by the benzene-ethylene ratio control system of the alkylation process unit in a styrene plant. 展开更多
关键词 The benzene-to-ethylene ratio control system health assessment data augmentation Wasserstein generative adversarial network with gradient penalty term dynamic attention mechanism into the convolutional neural network
在线阅读 下载PDF
上一页 1 2 10 下一页 到第
使用帮助 返回顶部