Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder ...Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder accurate classification.Researchers have explored various approaches to mitigate the effects of class imbalance.However,most studies focus only on processing correlations within a single category of samples.This paper introduces an ensemble framework called Inter-and Intra-Class Overlapping Ensemble(llCOE),which incorporates two sampling methods.The first method,which is based on classification hardness undersampling,targets majority category samples by using simple samples as the foundation for classification and improving performance by focusing on samples near classification boundaries.The second method addresses the issue of overfitting minority category samples in undersampling and ensemble learning.To mitigate this,an adaptive augment hybrid sampling method is proposed,which enhances the classification boundary of samples and reduces overfitting.This paper conducts multiple experiments on 15 public datasets and concludes that the IlCOE ensemble framework outperforms other ensemble learning algorithms in classifying imbalanced data.展开更多
Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining s...Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining samples do not correspond one-to-one correctly.Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning.In order to align the mismatched samples,this article presents a cooperative iteration matching method(CIMM)based on the modified dynamic time warping(DTW).The proposed method regards the sequentially accumulated industrial data as the time series.Mismatched samples are aligned by the DTW.In addition,dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient.Then a series of models are trained with the cumulated samples iteratively.Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method.展开更多
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh...A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.展开更多
In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and...In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption.展开更多
Imbalance is a distinctive feature of many datasets,and how to make the dataset balanced become a hot topic in the machine learning field.The Synthetic Minority Oversampling Technique(SMOTE)is the classical method to ...Imbalance is a distinctive feature of many datasets,and how to make the dataset balanced become a hot topic in the machine learning field.The Synthetic Minority Oversampling Technique(SMOTE)is the classical method to solve this problem.Although much research has been conducted on SMOTE,there is still the problem of synthetic sample singularity.To solve the issues of class imbalance and diversity of generated samples,this paper proposes a hybrid resampling method for binary imbalanced data sets,RE-SMOTE,which is designed based on the improvements of two oversampling methods parameter-free SMOTE(PF-SMOTE)and SMOTE-Weighted Ensemble Nearest Neighbor(SMOTE-WENN).Initially,minority class samples are divided into safe and boundary minority categories.Boundary minority samples are regenerated through linear interpolation with the nearest majority class samples.In contrast,safe minority samples are randomly generated within a circular range centered on the initial safe minority samples with a radius determined by the distance to the nearest majority class samples.Furthermore,we use Weighted Edited Nearest Neighbor(WENN)and relative density methods to clean the generated samples and remove the low-quality samples.Relative density is calculated based on the ratio of majority to minority samples among the reverse k-nearest neighbor samples.To verify the effectiveness and robustness of the proposed model,we conducted a comprehensive experimental study on 40 datasets selected from real applications.The experimental results show the superiority of radius estimation-SMOTE(RE-SMOTE)over other state-of-the-art methods.Code is available at:https://github.com/blue9792/RE-SMOTE(accessed on 30 September 2024).展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a...China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a new technology to increase the accuracy of seismic exploration. We briefly discuss point source and receiver technology, analyze the high density spatial sampling in situ method, introduce the symmetric sampling principles presented by Gijs J. O. Vermeer, and discuss high density spatial sampling technology from the point of view of wave field continuity. We emphasize the analysis of the high density spatial sampling characteristics, including the high density first break advantages for investigation of near surface structure, improving static correction precision, the use of dense receiver spacing at short offsets to increase the effective coverage at shallow depth, and the accuracy of reflection imaging. Coherent noise is not aliased and the noise analysis precision and suppression increases as a result. High density spatial sampling enhances wave field continuity and the accuracy of various mathematical transforms, which benefits wave field separation. Finally, we point out that the difficult part of high density spatial sampling technology is the data processing. More research needs to be done on the methods of analyzing and processing huge amounts of seismic data.展开更多
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
The capability of accurately predicting mineralogical brittleness index (BI) from basic suites of well logs is desirable as it provides a useful indicator of the fracability of tight formations.Measuring mineralogical...The capability of accurately predicting mineralogical brittleness index (BI) from basic suites of well logs is desirable as it provides a useful indicator of the fracability of tight formations.Measuring mineralogical components in rocks is expensive and time consuming.However,the basic well log curves are not well correlated with BI so correlation-based,machine-learning methods are not able to derive highly accurate BI predictions using such data.A correlation-free,optimized data-matching algorithm is configured to predict BI on a supervised basis from well log and core data available from two published wells in the Lower Barnett Shale Formation (Texas).This transparent open box (TOB) algorithm matches data records by calculating the sum of squared errors between their variables and selecting the best matches as those with the minimum squared errors.It then applies optimizers to adjust weights applied to individual variable errors to minimize the root mean square error (RMSE)between calculated and predicted (BI).The prediction accuracy achieved by TOB using just five well logs (Gr,ρb,Ns,Rs,Dt) to predict BI is dependent on the density of data records sampled.At a sampling density of about one sample per 0.5 ft BI is predicted with RMSE~0.056 and R^(2)~0.790.At a sampling density of about one sample per0.1 ft BI is predicted with RMSE~0.008 and R^(2)~0.995.Adding a stratigraphic height index as an additional (sixth)input variable method improves BI prediction accuracy to RMSE~0.003 and R^(2)~0.999 for the two wells with only 1 record in 10,000 yielding a BI prediction error of>±0.1.The model has the potential to be applied in an unsupervised basis to predict BI from basic well log data in surrounding wells lacking mineralogical measurements but with similar lithofacies and burial histories.The method could also be extended to predict elastic rock properties in and seismic attributes from wells and seismic data to improve the precision of brittleness index and fracability mapping spatially.展开更多
Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the ch...Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the characteristic of the material is highly nonlinear in nature,as is common in biological tissue.In this work,we identify unknown material properties in continuum solid mechanics via physics-informed neural networks(PINNs).To improve the accuracy and efficiency of PINNs,we develop efficient strategies to nonuniformly sample observational data.We also investigate different approaches to enforce Dirichlet-type boundary conditions(BCs)as soft or hard constraints.Finally,we apply the proposed methods to a diverse set of time-dependent and time-independent solid mechanic examples that span linear elastic and hyperelastic material space.The estimated material parameters achieve relative errors of less than 1%.As such,this work is relevant to diverse applications,including optimizing structural integrity and developing novel materials.展开更多
Freeform surface measurement is a key basic technology for product quality control and reverse engineering in aerospace field.Surface measurement technology based on multi-sensor fusion such as laser scanner and conta...Freeform surface measurement is a key basic technology for product quality control and reverse engineering in aerospace field.Surface measurement technology based on multi-sensor fusion such as laser scanner and contact probe can combine the complementary characteristics of different sensors,and has been widely concerned in industry and academia.The number and distribution of measurement points will significantly affect the efficiency of multisensor fusion and the accuracy of surface reconstruction.An aggregation‑value‑based active sampling method for multisensor freeform surface measurement and reconstruction is proposed.Based on game theory iteration,probe measurement points are generated actively,and the importance of each measurement point on freeform surface to multi-sensor fusion is clearly defined as Shapley value of the measurement point.Thus,the problem of obtaining the optimal measurement point set is transformed into the problem of maximizing the aggregation value of the sample set.Simulation and real measurement results verify that the proposed method can significantly reduce the required probe sample size while ensuring the measurement accuracy of multi-sensor fusion.展开更多
In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay fo...In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay for heterogeneous multi-agent systems is proposed. Then, the algebra graph theory, the matrix method, the stability theory of linear systems, and some other techniques are employed to derive the necessary and sufficient conditions guaranteeing heterogeneous multi-agent systems to asymptotically achieve the stationary consensus. Finally, simulations are performed to demonstrate the correctness of the theoretical results.展开更多
Objective To develop methods for determining a suitable sample size for bioequivalence assessment of generic topical ophthalmic drugs using crossover design with serial sampling schemes.Methods The power functions of ...Objective To develop methods for determining a suitable sample size for bioequivalence assessment of generic topical ophthalmic drugs using crossover design with serial sampling schemes.Methods The power functions of the Fieller-type confidence interval and the asymptotic confidence interval in crossover designs with serial-sampling data are here derived.Simulation studies were conducted to evaluate the derived power functions.Results Simulation studies show that two power functions can provide precise power estimates when normality assumptions are satisfied and yield conservative estimates of power in cases when data are log-normally distributed.The intra-correlation showed a positive correlation with the power of the bioequivalence test.When the expected ratio of the AUCs was less than or equal to 1, the power of the Fieller-type confidence interval was larger than the asymptotic confidence interval.If the expected ratio of the AUCs was larger than 1, the asymptotic confidence interval had greater power.Sample size can be calculated through numerical iteration with the derived power functions.Conclusion The Fieller-type power function and the asymptotic power function can be used to determine sample sizes of crossover trials for bioequivalence assessment of topical ophthalmic drugs.展开更多
This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usua...This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usual ones. First, it is time-dependent. Second, it may be discontinuous. Third, not every term of it is required to be positive definite. Fourth, the Lyapunov functional includes not only the state and the sampled state but also the integral of the state. By using a recently reported inequality to estimate the derivative of this Lyapunov functional, a sampled-interval-dependent stability criterion with reduced conservatism is obtained. The stability criterion is further extended to sampled-data systems with polytopic uncertainties. Finally, three examples are given to illustrate the reduced conservatism of the stability criteria.展开更多
The security control of Markovian jumping neural networks(MJNNs)is investigated under false data injection attacks that take place in the shared communication network.Stochastic sampleddata control is employed to rese...The security control of Markovian jumping neural networks(MJNNs)is investigated under false data injection attacks that take place in the shared communication network.Stochastic sampleddata control is employed to research the exponential synchronization of MJNNs under false data injection attacks(FDIAs)since it can alleviate the impact of the FDIAs on the performance of the system by adjusting the sampling periods.A multi-delay error system model is established through the input-delay approach.To reduce the conservatism of the results,a sampling-periodprobability-dependent looped Lyapunov functional is constructed.In light of some less conservative integral inequalities,a synchronization criterion is derived,and an algorithm is provided that can be solved for determining the controller gain.Finally,a numerical simulation is presented to confirm the efficiency of the proposed method.展开更多
The world of information technology is more than ever being flooded with huge amounts of data,nearly 2.5 quintillion bytes every day.This large stream of data is called big data,and the amount is increasing each day.T...The world of information technology is more than ever being flooded with huge amounts of data,nearly 2.5 quintillion bytes every day.This large stream of data is called big data,and the amount is increasing each day.This research uses a technique called sampling,which selects a representative subset of the data points,manipulates and analyzes this subset to identify patterns and trends in the larger dataset being examined,and finally,creates models.Sampling uses a small proportion of the original data for analysis and model training,so that it is relatively faster while maintaining data integrity and achieving accurate results.Two deep neural networks,AlexNet and DenseNet,were used in this research to test two sampling techniques,namely sampling with replacement and reservoir sampling.The dataset used for this research was divided into three classes:acceptable,flagged as easy,and flagged as hard.The base models were trained with the whole dataset,whereas the other models were trained on 50%of the original dataset.There were four combinations of model and sampling technique.The F-measure for the AlexNet model was 0.807 while that for the DenseNet model was 0.808.Combination 1 was the AlexNet model and sampling with replacement,achieving an average F-measure of 0.8852.Combination 3 was the AlexNet model and reservoir sampling.It had an average F-measure of 0.8545.Combination 2 was the DenseNet model and sampling with replacement,achieving an average F-measure of 0.8017.Finally,combination 4 was the DenseNet model and reservoir sampling.It had an average F-measure of 0.8111.Overall,we conclude that both models trained on a sampled dataset gave equal or better results compared to the base models,which used the whole dataset.展开更多
The fractionating tower bottom in fluid catalytic cracking Unit (FCCU) is highly susceptible to coking due to the interplay of complex external operating conditions and internal physical properties. Consequently, quan...The fractionating tower bottom in fluid catalytic cracking Unit (FCCU) is highly susceptible to coking due to the interplay of complex external operating conditions and internal physical properties. Consequently, quantitative risk assessment (QRA) and predictive maintenance (PdM) are essential to effectively manage coking risks influenced by multiple factors. However, the inherent uncertainties of the coking process, combined with the mixed-frequency nature of distributed control systems (DCS) and laboratory information management systems (LIMS) data, present significant challenges for the application of data-driven methods and their practical implementation in industrial environments. This study proposes a hierarchical framework that integrates deep learning and fuzzy logic inference, leveraging data and domain knowledge to monitor the coking condition and inform prescriptive maintenance planning. The framework proposes the multi-layer fuzzy inference system to construct the coking risk index, utilizes multi-label methods to select the optimal feature dataset across the reactor-regenerator and fractionation system using coking risk factors as label space, and designs the parallel encoder-integrated decoder architecture to address mixed-frequency data disparities and enhance adaptation capabilities through extracting the operation state and physical properties information. Additionally, triple attention mechanisms, whether in parallel or temporal modules, adaptively aggregate input information and enhance intrinsic interpretability to support the disposal decision-making. Applied in the 2.8 million tons FCCU under long-period complex operating conditions, enabling precise coking risk management at the fractionating tower bottom.展开更多
In the face of data scarcity in the optimization of maintenance strategies for civil aircraft,traditional failure data-driven methods are encountering challenges owing to the increasing reliability of aircraft design....In the face of data scarcity in the optimization of maintenance strategies for civil aircraft,traditional failure data-driven methods are encountering challenges owing to the increasing reliability of aircraft design.This study addresses this issue by presenting a novel combined data fusion algorithm,which serves to enhance the accuracy and reliability of failure rate analysis for a specific aircraft model by integrating historical failure data from similar models as supplementary information.Through a comprehensive analysis of two different maintenance projects,this study illustrates the application process of the algorithm.Building upon the analysis results,this paper introduces the innovative equal integral value method as a replacement for the conventional equal interval method in the context of maintenance schedule optimization.The Monte Carlo simulation example validates that the equivalent essential value method surpasses the traditional method by over 20%in terms of inspection efficiency ratio.This discovery indicates that the equal critical value method not only upholds maintenance efficiency but also substantially decreases workload and maintenance costs.The findings of this study open up novel perspectives for airlines grappling with data scarcity,offer fresh strategies for the optimization of aviation maintenance practices,and chart a new course toward achieving more efficient and cost-effective maintenance schedule optimization through refined data analysis.展开更多
Fourier transform is a basis of the analysis. This paper presents a kind ofmethod of minimum sampling data determined profile of the inverted object ininverse scattering.
With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multi...With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multimedia files while enabling full recovery of the original data after extraction.Audio,as a vital medium in communication,entertainment,and information sharing,demands the same level of security as images.However,embedding data in encrypted audio poses unique challenges due to the trade-offs between security,data integrity,and embedding capacity.This paper presents a novel interpolation-based reversible data hiding algorithm for encrypted audio that achieves scalable embedding capacity.By increasing sample density through interpolation,embedding opportunities are significantly enhanced while maintaining encryption throughout the process.The method further integrates multiple most significant bit(multi-MSB)prediction and Huffman coding to optimize compression and embedding efficiency.Experimental results on standard audio datasets demonstrate the proposed algorithm’s ability to embed up to 12.47 bits per sample with over 9.26 bits per sample available for pure embedding capacity,while preserving full reversibility.These results confirm the method’s suitability for secure applications that demand high embedding capacity and perfect reconstruction of original audio.This work advances reversible data hiding in encrypted audio by offering a secure,efficient,and fully reversible data hiding framework.展开更多
基金supported by the National Natural Science Foundation of China(No.62173158)the National Key Research and Development Program of China(No.2019YFC0119600)the Major Science and Technology Program of Hainan Province(No.ZDKJ202004).
文摘Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder accurate classification.Researchers have explored various approaches to mitigate the effects of class imbalance.However,most studies focus only on processing correlations within a single category of samples.This paper introduces an ensemble framework called Inter-and Intra-Class Overlapping Ensemble(llCOE),which incorporates two sampling methods.The first method,which is based on classification hardness undersampling,targets majority category samples by using simple samples as the foundation for classification and improving performance by focusing on samples near classification boundaries.The second method addresses the issue of overfitting minority category samples in undersampling and ensemble learning.To mitigate this,an adaptive augment hybrid sampling method is proposed,which enhances the classification boundary of samples and reduces overfitting.This paper conducts multiple experiments on 15 public datasets and concludes that the IlCOE ensemble framework outperforms other ensemble learning algorithms in classifying imbalanced data.
基金the Key National Natural Science Foundation of China(No.U1864211)the National Natural Science Foundation of China(No.11772191)the Natural Science Foundation of Shanghai(No.21ZR1431500)。
文摘Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining samples do not correspond one-to-one correctly.Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning.In order to align the mismatched samples,this article presents a cooperative iteration matching method(CIMM)based on the modified dynamic time warping(DTW).The proposed method regards the sequentially accumulated industrial data as the time series.Mismatched samples are aligned by the DTW.In addition,dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient.Then a series of models are trained with the cumulated samples iteratively.Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method.
基金The High Technology Research Plan of Jiangsu Prov-ince (No.BG2004034)the Foundation of Graduate Creative Program ofJiangsu Province (No.xm04-36).
文摘A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.
基金The National Basic Research Program of China(973Program)(No.2009CB320505)the Natural Science Foundation of Jiangsu Province(No. BK2008288)+1 种基金the Excellent Young Teachers Program of Southeast University(No.4009001018)the Open Research Program of Key Laboratory of Computer Network of Guangdong Province (No. CCNL200706)
文摘In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption.
基金supported by the National Key R&D Program of China,No.2022YFC3006302.
文摘Imbalance is a distinctive feature of many datasets,and how to make the dataset balanced become a hot topic in the machine learning field.The Synthetic Minority Oversampling Technique(SMOTE)is the classical method to solve this problem.Although much research has been conducted on SMOTE,there is still the problem of synthetic sample singularity.To solve the issues of class imbalance and diversity of generated samples,this paper proposes a hybrid resampling method for binary imbalanced data sets,RE-SMOTE,which is designed based on the improvements of two oversampling methods parameter-free SMOTE(PF-SMOTE)and SMOTE-Weighted Ensemble Nearest Neighbor(SMOTE-WENN).Initially,minority class samples are divided into safe and boundary minority categories.Boundary minority samples are regenerated through linear interpolation with the nearest majority class samples.In contrast,safe minority samples are randomly generated within a circular range centered on the initial safe minority samples with a radius determined by the distance to the nearest majority class samples.Furthermore,we use Weighted Edited Nearest Neighbor(WENN)and relative density methods to clean the generated samples and remove the low-quality samples.Relative density is calculated based on the ratio of majority to minority samples among the reverse k-nearest neighbor samples.To verify the effectiveness and robustness of the proposed model,we conducted a comprehensive experimental study on 40 datasets selected from real applications.The experimental results show the superiority of radius estimation-SMOTE(RE-SMOTE)over other state-of-the-art methods.Code is available at:https://github.com/blue9792/RE-SMOTE(accessed on 30 September 2024).
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a new technology to increase the accuracy of seismic exploration. We briefly discuss point source and receiver technology, analyze the high density spatial sampling in situ method, introduce the symmetric sampling principles presented by Gijs J. O. Vermeer, and discuss high density spatial sampling technology from the point of view of wave field continuity. We emphasize the analysis of the high density spatial sampling characteristics, including the high density first break advantages for investigation of near surface structure, improving static correction precision, the use of dense receiver spacing at short offsets to increase the effective coverage at shallow depth, and the accuracy of reflection imaging. Coherent noise is not aliased and the noise analysis precision and suppression increases as a result. High density spatial sampling enhances wave field continuity and the accuracy of various mathematical transforms, which benefits wave field separation. Finally, we point out that the difficult part of high density spatial sampling technology is the data processing. More research needs to be done on the methods of analyzing and processing huge amounts of seismic data.
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
文摘The capability of accurately predicting mineralogical brittleness index (BI) from basic suites of well logs is desirable as it provides a useful indicator of the fracability of tight formations.Measuring mineralogical components in rocks is expensive and time consuming.However,the basic well log curves are not well correlated with BI so correlation-based,machine-learning methods are not able to derive highly accurate BI predictions using such data.A correlation-free,optimized data-matching algorithm is configured to predict BI on a supervised basis from well log and core data available from two published wells in the Lower Barnett Shale Formation (Texas).This transparent open box (TOB) algorithm matches data records by calculating the sum of squared errors between their variables and selecting the best matches as those with the minimum squared errors.It then applies optimizers to adjust weights applied to individual variable errors to minimize the root mean square error (RMSE)between calculated and predicted (BI).The prediction accuracy achieved by TOB using just five well logs (Gr,ρb,Ns,Rs,Dt) to predict BI is dependent on the density of data records sampled.At a sampling density of about one sample per 0.5 ft BI is predicted with RMSE~0.056 and R^(2)~0.790.At a sampling density of about one sample per0.1 ft BI is predicted with RMSE~0.008 and R^(2)~0.995.Adding a stratigraphic height index as an additional (sixth)input variable method improves BI prediction accuracy to RMSE~0.003 and R^(2)~0.999 for the two wells with only 1 record in 10,000 yielding a BI prediction error of>±0.1.The model has the potential to be applied in an unsupervised basis to predict BI from basic well log data in surrounding wells lacking mineralogical measurements but with similar lithofacies and burial histories.The method could also be extended to predict elastic rock properties in and seismic attributes from wells and seismic data to improve the precision of brittleness index and fracability mapping spatially.
基金funded by the Cora Topolewski Cardiac Research Fund at the Children’s Hospital of Philadelphia(CHOP)the Pediatric Valve Center Frontier Program at CHOP+4 种基金the Additional Ventures Single Ventricle Research Fund Expansion Awardthe National Institutes of Health(USA)supported by the program(Nos.NHLBI T32 HL007915 and NIH R01 HL153166)supported by the program(No.NIH R01 HL153166)supported by the U.S.Department of Energy(No.DE-SC0022953)。
文摘Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the characteristic of the material is highly nonlinear in nature,as is common in biological tissue.In this work,we identify unknown material properties in continuum solid mechanics via physics-informed neural networks(PINNs).To improve the accuracy and efficiency of PINNs,we develop efficient strategies to nonuniformly sample observational data.We also investigate different approaches to enforce Dirichlet-type boundary conditions(BCs)as soft or hard constraints.Finally,we apply the proposed methods to a diverse set of time-dependent and time-independent solid mechanic examples that span linear elastic and hyperelastic material space.The estimated material parameters achieve relative errors of less than 1%.As such,this work is relevant to diverse applications,including optimizing structural integrity and developing novel materials.
基金supported by the Na‑tional Key R&D Program of China(No.2022YFB3402600)the National Science Fund for Distinguished Young Scholars(No.51925505)+1 种基金the General Program of National Natural Science Foundation of China(No.52275491)Joint Funds of the National Natural Science Foundation of China(No.U21B2081).
文摘Freeform surface measurement is a key basic technology for product quality control and reverse engineering in aerospace field.Surface measurement technology based on multi-sensor fusion such as laser scanner and contact probe can combine the complementary characteristics of different sensors,and has been widely concerned in industry and academia.The number and distribution of measurement points will significantly affect the efficiency of multisensor fusion and the accuracy of surface reconstruction.An aggregation‑value‑based active sampling method for multisensor freeform surface measurement and reconstruction is proposed.Based on game theory iteration,probe measurement points are generated actively,and the importance of each measurement point on freeform surface to multi-sensor fusion is clearly defined as Shapley value of the measurement point.Thus,the problem of obtaining the optimal measurement point set is transformed into the problem of maximizing the aggregation value of the sample set.Simulation and real measurement results verify that the proposed method can significantly reduce the required probe sample size while ensuring the measurement accuracy of multi-sensor fusion.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61203147,61374047,61203126,and 61104092)the Humanities and Social Sciences Youth Funds of the Ministry of Education,China(Grant No.12YJCZH218)
文摘In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay for heterogeneous multi-agent systems is proposed. Then, the algebra graph theory, the matrix method, the stability theory of linear systems, and some other techniques are employed to derive the necessary and sufficient conditions guaranteeing heterogeneous multi-agent systems to asymptotically achieve the stationary consensus. Finally, simulations are performed to demonstrate the correctness of the theoretical results.
基金supported by sub-project of National Major Scientific and Technological Special Project of China for ‘Significant New Drugs Development’[2015ZX09501008-004]
文摘Objective To develop methods for determining a suitable sample size for bioequivalence assessment of generic topical ophthalmic drugs using crossover design with serial sampling schemes.Methods The power functions of the Fieller-type confidence interval and the asymptotic confidence interval in crossover designs with serial-sampling data are here derived.Simulation studies were conducted to evaluate the derived power functions.Results Simulation studies show that two power functions can provide precise power estimates when normality assumptions are satisfied and yield conservative estimates of power in cases when data are log-normally distributed.The intra-correlation showed a positive correlation with the power of the bioequivalence test.When the expected ratio of the AUCs was less than or equal to 1, the power of the Fieller-type confidence interval was larger than the asymptotic confidence interval.If the expected ratio of the AUCs was larger than 1, the asymptotic confidence interval had greater power.Sample size can be calculated through numerical iteration with the derived power functions.Conclusion The Fieller-type power function and the asymptotic power function can be used to determine sample sizes of crossover trials for bioequivalence assessment of topical ophthalmic drugs.
基金supported by the National Natural Science Foundation of China(61374090)the Program for Scientific Research Innovation Team in Colleges and Universities of Shandong Provincethe Taishan Scholarship Project of Shandong Province
文摘This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usual ones. First, it is time-dependent. Second, it may be discontinuous. Third, not every term of it is required to be positive definite. Fourth, the Lyapunov functional includes not only the state and the sampled state but also the integral of the state. By using a recently reported inequality to estimate the derivative of this Lyapunov functional, a sampled-interval-dependent stability criterion with reduced conservatism is obtained. The stability criterion is further extended to sampled-data systems with polytopic uncertainties. Finally, three examples are given to illustrate the reduced conservatism of the stability criteria.
基金the NNSF of China under Grants 61973199,62003794,62173214the Shandong Provincial NSF ZR2020QF050,ZR2021MF003。
文摘The security control of Markovian jumping neural networks(MJNNs)is investigated under false data injection attacks that take place in the shared communication network.Stochastic sampleddata control is employed to research the exponential synchronization of MJNNs under false data injection attacks(FDIAs)since it can alleviate the impact of the FDIAs on the performance of the system by adjusting the sampling periods.A multi-delay error system model is established through the input-delay approach.To reduce the conservatism of the results,a sampling-periodprobability-dependent looped Lyapunov functional is constructed.In light of some less conservative integral inequalities,a synchronization criterion is derived,and an algorithm is provided that can be solved for determining the controller gain.Finally,a numerical simulation is presented to confirm the efficiency of the proposed method.
文摘The world of information technology is more than ever being flooded with huge amounts of data,nearly 2.5 quintillion bytes every day.This large stream of data is called big data,and the amount is increasing each day.This research uses a technique called sampling,which selects a representative subset of the data points,manipulates and analyzes this subset to identify patterns and trends in the larger dataset being examined,and finally,creates models.Sampling uses a small proportion of the original data for analysis and model training,so that it is relatively faster while maintaining data integrity and achieving accurate results.Two deep neural networks,AlexNet and DenseNet,were used in this research to test two sampling techniques,namely sampling with replacement and reservoir sampling.The dataset used for this research was divided into three classes:acceptable,flagged as easy,and flagged as hard.The base models were trained with the whole dataset,whereas the other models were trained on 50%of the original dataset.There were four combinations of model and sampling technique.The F-measure for the AlexNet model was 0.807 while that for the DenseNet model was 0.808.Combination 1 was the AlexNet model and sampling with replacement,achieving an average F-measure of 0.8852.Combination 3 was the AlexNet model and reservoir sampling.It had an average F-measure of 0.8545.Combination 2 was the DenseNet model and sampling with replacement,achieving an average F-measure of 0.8017.Finally,combination 4 was the DenseNet model and reservoir sampling.It had an average F-measure of 0.8111.Overall,we conclude that both models trained on a sampled dataset gave equal or better results compared to the base models,which used the whole dataset.
基金financially supported by the Innovative Research Group Project of the National Natural Science Foundation of China (22021004)Sinopec Major Science and Technology Projects (321123-1)
文摘The fractionating tower bottom in fluid catalytic cracking Unit (FCCU) is highly susceptible to coking due to the interplay of complex external operating conditions and internal physical properties. Consequently, quantitative risk assessment (QRA) and predictive maintenance (PdM) are essential to effectively manage coking risks influenced by multiple factors. However, the inherent uncertainties of the coking process, combined with the mixed-frequency nature of distributed control systems (DCS) and laboratory information management systems (LIMS) data, present significant challenges for the application of data-driven methods and their practical implementation in industrial environments. This study proposes a hierarchical framework that integrates deep learning and fuzzy logic inference, leveraging data and domain knowledge to monitor the coking condition and inform prescriptive maintenance planning. The framework proposes the multi-layer fuzzy inference system to construct the coking risk index, utilizes multi-label methods to select the optimal feature dataset across the reactor-regenerator and fractionation system using coking risk factors as label space, and designs the parallel encoder-integrated decoder architecture to address mixed-frequency data disparities and enhance adaptation capabilities through extracting the operation state and physical properties information. Additionally, triple attention mechanisms, whether in parallel or temporal modules, adaptively aggregate input information and enhance intrinsic interpretability to support the disposal decision-making. Applied in the 2.8 million tons FCCU under long-period complex operating conditions, enabling precise coking risk management at the fractionating tower bottom.
文摘In the face of data scarcity in the optimization of maintenance strategies for civil aircraft,traditional failure data-driven methods are encountering challenges owing to the increasing reliability of aircraft design.This study addresses this issue by presenting a novel combined data fusion algorithm,which serves to enhance the accuracy and reliability of failure rate analysis for a specific aircraft model by integrating historical failure data from similar models as supplementary information.Through a comprehensive analysis of two different maintenance projects,this study illustrates the application process of the algorithm.Building upon the analysis results,this paper introduces the innovative equal integral value method as a replacement for the conventional equal interval method in the context of maintenance schedule optimization.The Monte Carlo simulation example validates that the equivalent essential value method surpasses the traditional method by over 20%in terms of inspection efficiency ratio.This discovery indicates that the equal critical value method not only upholds maintenance efficiency but also substantially decreases workload and maintenance costs.The findings of this study open up novel perspectives for airlines grappling with data scarcity,offer fresh strategies for the optimization of aviation maintenance practices,and chart a new course toward achieving more efficient and cost-effective maintenance schedule optimization through refined data analysis.
文摘Fourier transform is a basis of the analysis. This paper presents a kind ofmethod of minimum sampling data determined profile of the inverted object ininverse scattering.
基金funded by theNational Science and Technology Council of Taiwan under the grant number NSTC 113-2221-E-035-058.
文摘With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multimedia files while enabling full recovery of the original data after extraction.Audio,as a vital medium in communication,entertainment,and information sharing,demands the same level of security as images.However,embedding data in encrypted audio poses unique challenges due to the trade-offs between security,data integrity,and embedding capacity.This paper presents a novel interpolation-based reversible data hiding algorithm for encrypted audio that achieves scalable embedding capacity.By increasing sample density through interpolation,embedding opportunities are significantly enhanced while maintaining encryption throughout the process.The method further integrates multiple most significant bit(multi-MSB)prediction and Huffman coding to optimize compression and embedding efficiency.Experimental results on standard audio datasets demonstrate the proposed algorithm’s ability to embed up to 12.47 bits per sample with over 9.26 bits per sample available for pure embedding capacity,while preserving full reversibility.These results confirm the method’s suitability for secure applications that demand high embedding capacity and perfect reconstruction of original audio.This work advances reversible data hiding in encrypted audio by offering a secure,efficient,and fully reversible data hiding framework.