The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability...The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability,operational efficiency,and security depends on the identification of anomalies in these dynamic and resource-constrained systems.Due to their high computational requirements and inability to efficiently process continuous data streams,traditional anomaly detection techniques often fail in IoT systems.This work presents a resource-efficient adaptive anomaly detection model for real-time streaming data in IoT systems.Extensive experiments were carried out on multiple real-world datasets,achieving an average accuracy score of 96.06%with an execution time close to 7.5 milliseconds for each individual streaming data point,demonstrating its potential for real-time,resourceconstrained applications.The model uses Principal Component Analysis(PCA)for dimensionality reduction and a Z-score technique for anomaly detection.It maintains a low computational footprint with a sliding window mechanism,enabling incremental data processing and identification of both transient and sustained anomalies without storing historical data.The system uses a Multivariate Linear Regression(MLR)based imputation technique that estimates missing or corrupted sensor values,preserving data integrity prior to anomaly detection.The suggested solution is appropriate for many uses in smart cities,industrial automation,environmental monitoring,IoT security,and intelligent transportation systems,and is particularly well-suited for resource-constrained edge devices.展开更多
Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder ...Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder accurate classification.Researchers have explored various approaches to mitigate the effects of class imbalance.However,most studies focus only on processing correlations within a single category of samples.This paper introduces an ensemble framework called Inter-and Intra-Class Overlapping Ensemble(llCOE),which incorporates two sampling methods.The first method,which is based on classification hardness undersampling,targets majority category samples by using simple samples as the foundation for classification and improving performance by focusing on samples near classification boundaries.The second method addresses the issue of overfitting minority category samples in undersampling and ensemble learning.To mitigate this,an adaptive augment hybrid sampling method is proposed,which enhances the classification boundary of samples and reduces overfitting.This paper conducts multiple experiments on 15 public datasets and concludes that the IlCOE ensemble framework outperforms other ensemble learning algorithms in classifying imbalanced data.展开更多
The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for he...The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.展开更多
Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining s...Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining samples do not correspond one-to-one correctly.Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning.In order to align the mismatched samples,this article presents a cooperative iteration matching method(CIMM)based on the modified dynamic time warping(DTW).The proposed method regards the sequentially accumulated industrial data as the time series.Mismatched samples are aligned by the DTW.In addition,dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient.Then a series of models are trained with the cumulated samples iteratively.Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method.展开更多
Psychological distress detection plays a critical role in modern healthcare,especially in ambient environments where continuous monitoring is essential for timely intervention.Advances in sensor technology and artific...Psychological distress detection plays a critical role in modern healthcare,especially in ambient environments where continuous monitoring is essential for timely intervention.Advances in sensor technology and artificial intelligence(AI)have enabled the development of systems capable of mental health monitoring using multimodal data.However,existing models often struggle with contextual adaptation and real-time decision-making in dynamic settings.This paper addresses these challenges by proposing TRANS-HEALTH,a hybrid framework that integrates transformer-based inference with Belief-Desire-Intention(BDI)reasoning for real-time psychological distress detection.The framework utilizes a multimodal dataset containing EEG,GSR,heart rate,and activity data to predict distress while adapting to individual contexts.The methodology combines deep learning for robust pattern recognition and symbolic BDI reasoning to enable adaptive decision-making.The novelty of the approach lies in its seamless integration of transformermodelswith BDI reasoning,providing both high accuracy and contextual relevance in real time.Performance metrics such as accuracy,precision,recall,and F1-score are employed to evaluate the system’s performance.The results show that TRANS-HEALTH outperforms existing models,achieving 96.1% accuracy with 4.78 ms latency and significantly reducing false alerts,with an enhanced ability to engage users,making it suitable for deployment in wearable and remote healthcare environments.展开更多
The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improv...The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.展开更多
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh...A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.展开更多
In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and...In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption.展开更多
In petroleum engineering,real-time lithology identification is very important for reservoir evaluation,drilling decisions and petroleum geological exploration.A lithology identification method while drilling based on ...In petroleum engineering,real-time lithology identification is very important for reservoir evaluation,drilling decisions and petroleum geological exploration.A lithology identification method while drilling based on machine learning and mud logging data is studied in this paper.This method can effectively utilize downhole parameters collected in real-time during drilling,to identify lithology in real-time and provide a reference for optimization of drilling parameters.Given the imbalance of lithology samples,the synthetic minority over-sampling technique(SMOTE)and Tomek link were used to balance the sample number of five lithologies.Meanwhile,this paper introduces Tent map,random opposition-based learning and dynamic perceived probability to the original crow search algorithm(CSA),and establishes an improved crow search algorithm(ICSA).In this paper,ICSA is used to optimize the hyperparameter combination of random forest(RF),extremely random trees(ET),extreme gradient boosting(XGB),and light gradient boosting machine(LGBM)models.In addition,this study combines the recognition advantages of the four models.The accuracy of lithology identification by the weighted average probability model reaches 0.877.The study of this paper realizes high-precision real-time lithology identification method,which can provide lithology reference for the drilling process.展开更多
Imbalance is a distinctive feature of many datasets,and how to make the dataset balanced become a hot topic in the machine learning field.The Synthetic Minority Oversampling Technique(SMOTE)is the classical method to ...Imbalance is a distinctive feature of many datasets,and how to make the dataset balanced become a hot topic in the machine learning field.The Synthetic Minority Oversampling Technique(SMOTE)is the classical method to solve this problem.Although much research has been conducted on SMOTE,there is still the problem of synthetic sample singularity.To solve the issues of class imbalance and diversity of generated samples,this paper proposes a hybrid resampling method for binary imbalanced data sets,RE-SMOTE,which is designed based on the improvements of two oversampling methods parameter-free SMOTE(PF-SMOTE)and SMOTE-Weighted Ensemble Nearest Neighbor(SMOTE-WENN).Initially,minority class samples are divided into safe and boundary minority categories.Boundary minority samples are regenerated through linear interpolation with the nearest majority class samples.In contrast,safe minority samples are randomly generated within a circular range centered on the initial safe minority samples with a radius determined by the distance to the nearest majority class samples.Furthermore,we use Weighted Edited Nearest Neighbor(WENN)and relative density methods to clean the generated samples and remove the low-quality samples.Relative density is calculated based on the ratio of majority to minority samples among the reverse k-nearest neighbor samples.To verify the effectiveness and robustness of the proposed model,we conducted a comprehensive experimental study on 40 datasets selected from real applications.The experimental results show the superiority of radius estimation-SMOTE(RE-SMOTE)over other state-of-the-art methods.Code is available at:https://github.com/blue9792/RE-SMOTE(accessed on 30 September 2024).展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great s...Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.展开更多
China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a...China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a new technology to increase the accuracy of seismic exploration. We briefly discuss point source and receiver technology, analyze the high density spatial sampling in situ method, introduce the symmetric sampling principles presented by Gijs J. O. Vermeer, and discuss high density spatial sampling technology from the point of view of wave field continuity. We emphasize the analysis of the high density spatial sampling characteristics, including the high density first break advantages for investigation of near surface structure, improving static correction precision, the use of dense receiver spacing at short offsets to increase the effective coverage at shallow depth, and the accuracy of reflection imaging. Coherent noise is not aliased and the noise analysis precision and suppression increases as a result. High density spatial sampling enhances wave field continuity and the accuracy of various mathematical transforms, which benefits wave field separation. Finally, we point out that the difficult part of high density spatial sampling technology is the data processing. More research needs to be done on the methods of analyzing and processing huge amounts of seismic data.展开更多
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution ...High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.展开更多
The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information ...The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.展开更多
The co-frequency vibration fault is one of the common faults in the operation of rotating equipment,and realizing the real-time diagnosis of the co-frequency vibration fault is of great significance for monitoring the...The co-frequency vibration fault is one of the common faults in the operation of rotating equipment,and realizing the real-time diagnosis of the co-frequency vibration fault is of great significance for monitoring the health state and carrying out vibration suppression of the equipment.In engineering scenarios,co-frequency vibration faults are highlighted by rotational frequency and are difficult to identify,and existing intelligent methods require more hardware conditions and are exclusively time-consuming.Therefore,Lightweight-convolutional neural networks(LW-CNN)algorithm is proposed in this paper to achieve real-time fault diagnosis.The critical parameters are discussed and verified by simulated and experimental signals for the sliding window data augmentation method.Based on LW-CNN and data augmentation,the real-time intelligent diagnosis of co-frequency is realized.Moreover,a real-time detection method of fault diagnosis algorithm is proposed for data acquisition to fault diagnosis.It is verified by experiments that the LW-CNN and sliding window methods are used with high accuracy and real-time performance.展开更多
Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of i...Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.展开更多
Objective Antimicrobial resistance(AMR)has become a global concern and is especially severe in China.To effectively and reliably provide AMR data,we developed a new high-throughput real-time PCR assay based on microfl...Objective Antimicrobial resistance(AMR)has become a global concern and is especially severe in China.To effectively and reliably provide AMR data,we developed a new high-throughput real-time PCR assay based on microfluidic dynamic technology,and screened multiple AMR genes in broiler fecal samples.Methods A high-throughput real-time PCR system with an new designed integrated fluidic circuit assay were performed AMR gene detection.A total of 273 broiler fecal samples collected from two geographically separated farms were screened AMR genes.Results The new assay with limits of detection ranging from 40.9 to 8,000 copies/reaction.The sensitivity rate,specificity rate,positive predictive value,negative predictive value and correct indices were 99.30%,98.08%,95.31%,99.79%,and 0.9755,respectively.Utilizing this assay,we demonstrate that AMR genes are widely spread,with positive detection rates ranging from 0 to 97.07%in 273 broiler fecal samples.bla CTX-M,bla TEM,mcr-1,fex A,cfr,optr A,and int I1 showed over 80%prevalence.The dissemination of AMR genes was distinct between the two farms.Conclusions We successfully established a new high-throughput real-time PCR assay applicable to AMR gene surveillance from fecal samples.The widespread existence of AMR genes detected in broiler farms highlights the current and severe problem of AMR.展开更多
基金funded by the Ongoing Research Funding Program(ORF-2025-890)King Saud University,Riyadh,Saudi Arabia and was supported by the Competitive Research Fund of theUniversity of Aizu,Japan.
文摘The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability,operational efficiency,and security depends on the identification of anomalies in these dynamic and resource-constrained systems.Due to their high computational requirements and inability to efficiently process continuous data streams,traditional anomaly detection techniques often fail in IoT systems.This work presents a resource-efficient adaptive anomaly detection model for real-time streaming data in IoT systems.Extensive experiments were carried out on multiple real-world datasets,achieving an average accuracy score of 96.06%with an execution time close to 7.5 milliseconds for each individual streaming data point,demonstrating its potential for real-time,resourceconstrained applications.The model uses Principal Component Analysis(PCA)for dimensionality reduction and a Z-score technique for anomaly detection.It maintains a low computational footprint with a sliding window mechanism,enabling incremental data processing and identification of both transient and sustained anomalies without storing historical data.The system uses a Multivariate Linear Regression(MLR)based imputation technique that estimates missing or corrupted sensor values,preserving data integrity prior to anomaly detection.The suggested solution is appropriate for many uses in smart cities,industrial automation,environmental monitoring,IoT security,and intelligent transportation systems,and is particularly well-suited for resource-constrained edge devices.
基金supported by the National Natural Science Foundation of China(No.62173158)the National Key Research and Development Program of China(No.2019YFC0119600)the Major Science and Technology Program of Hainan Province(No.ZDKJ202004).
文摘Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder accurate classification.Researchers have explored various approaches to mitigate the effects of class imbalance.However,most studies focus only on processing correlations within a single category of samples.This paper introduces an ensemble framework called Inter-and Intra-Class Overlapping Ensemble(llCOE),which incorporates two sampling methods.The first method,which is based on classification hardness undersampling,targets majority category samples by using simple samples as the foundation for classification and improving performance by focusing on samples near classification boundaries.The second method addresses the issue of overfitting minority category samples in undersampling and ensemble learning.To mitigate this,an adaptive augment hybrid sampling method is proposed,which enhances the classification boundary of samples and reduces overfitting.This paper conducts multiple experiments on 15 public datasets and concludes that the IlCOE ensemble framework outperforms other ensemble learning algorithms in classifying imbalanced data.
基金funded by the ICT Division of theMinistry of Posts,Telecommunications,and Information Technology of Bangladesh under Grant Number 56.00.0000.052.33.005.21-7(Tracking No.22FS15306)support from the University of Rajshahi.
文摘The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.
基金the Key National Natural Science Foundation of China(No.U1864211)the National Natural Science Foundation of China(No.11772191)the Natural Science Foundation of Shanghai(No.21ZR1431500)。
文摘Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining samples do not correspond one-to-one correctly.Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning.In order to align the mismatched samples,this article presents a cooperative iteration matching method(CIMM)based on the modified dynamic time warping(DTW).The proposed method regards the sequentially accumulated industrial data as the time series.Mismatched samples are aligned by the DTW.In addition,dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient.Then a series of models are trained with the cumulated samples iteratively.Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R435),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Psychological distress detection plays a critical role in modern healthcare,especially in ambient environments where continuous monitoring is essential for timely intervention.Advances in sensor technology and artificial intelligence(AI)have enabled the development of systems capable of mental health monitoring using multimodal data.However,existing models often struggle with contextual adaptation and real-time decision-making in dynamic settings.This paper addresses these challenges by proposing TRANS-HEALTH,a hybrid framework that integrates transformer-based inference with Belief-Desire-Intention(BDI)reasoning for real-time psychological distress detection.The framework utilizes a multimodal dataset containing EEG,GSR,heart rate,and activity data to predict distress while adapting to individual contexts.The methodology combines deep learning for robust pattern recognition and symbolic BDI reasoning to enable adaptive decision-making.The novelty of the approach lies in its seamless integration of transformermodelswith BDI reasoning,providing both high accuracy and contextual relevance in real time.Performance metrics such as accuracy,precision,recall,and F1-score are employed to evaluate the system’s performance.The results show that TRANS-HEALTH outperforms existing models,achieving 96.1% accuracy with 4.78 ms latency and significantly reducing false alerts,with an enhanced ability to engage users,making it suitable for deployment in wearable and remote healthcare environments.
文摘The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.
基金The High Technology Research Plan of Jiangsu Prov-ince (No.BG2004034)the Foundation of Graduate Creative Program ofJiangsu Province (No.xm04-36).
文摘A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.
基金The National Basic Research Program of China(973Program)(No.2009CB320505)the Natural Science Foundation of Jiangsu Province(No. BK2008288)+1 种基金the Excellent Young Teachers Program of Southeast University(No.4009001018)the Open Research Program of Key Laboratory of Computer Network of Guangdong Province (No. CCNL200706)
文摘In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption.
基金supported by CNPC-CZU Innovation Alliancesupported by the Program of Polar Drilling Environmental Protection and Waste Treatment Technology (2022YFC2806403)。
文摘In petroleum engineering,real-time lithology identification is very important for reservoir evaluation,drilling decisions and petroleum geological exploration.A lithology identification method while drilling based on machine learning and mud logging data is studied in this paper.This method can effectively utilize downhole parameters collected in real-time during drilling,to identify lithology in real-time and provide a reference for optimization of drilling parameters.Given the imbalance of lithology samples,the synthetic minority over-sampling technique(SMOTE)and Tomek link were used to balance the sample number of five lithologies.Meanwhile,this paper introduces Tent map,random opposition-based learning and dynamic perceived probability to the original crow search algorithm(CSA),and establishes an improved crow search algorithm(ICSA).In this paper,ICSA is used to optimize the hyperparameter combination of random forest(RF),extremely random trees(ET),extreme gradient boosting(XGB),and light gradient boosting machine(LGBM)models.In addition,this study combines the recognition advantages of the four models.The accuracy of lithology identification by the weighted average probability model reaches 0.877.The study of this paper realizes high-precision real-time lithology identification method,which can provide lithology reference for the drilling process.
基金supported by the National Key R&D Program of China,No.2022YFC3006302.
文摘Imbalance is a distinctive feature of many datasets,and how to make the dataset balanced become a hot topic in the machine learning field.The Synthetic Minority Oversampling Technique(SMOTE)is the classical method to solve this problem.Although much research has been conducted on SMOTE,there is still the problem of synthetic sample singularity.To solve the issues of class imbalance and diversity of generated samples,this paper proposes a hybrid resampling method for binary imbalanced data sets,RE-SMOTE,which is designed based on the improvements of two oversampling methods parameter-free SMOTE(PF-SMOTE)and SMOTE-Weighted Ensemble Nearest Neighbor(SMOTE-WENN).Initially,minority class samples are divided into safe and boundary minority categories.Boundary minority samples are regenerated through linear interpolation with the nearest majority class samples.In contrast,safe minority samples are randomly generated within a circular range centered on the initial safe minority samples with a radius determined by the distance to the nearest majority class samples.Furthermore,we use Weighted Edited Nearest Neighbor(WENN)and relative density methods to clean the generated samples and remove the low-quality samples.Relative density is calculated based on the ratio of majority to minority samples among the reverse k-nearest neighbor samples.To verify the effectiveness and robustness of the proposed model,we conducted a comprehensive experimental study on 40 datasets selected from real applications.The experimental results show the superiority of radius estimation-SMOTE(RE-SMOTE)over other state-of-the-art methods.Code is available at:https://github.com/blue9792/RE-SMOTE(accessed on 30 September 2024).
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
基金Supported by the National Key Research and Development Program of China(Nos.2016YFC1402000,2018YFC1407003,2017YFC1405300)
文摘Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.
文摘China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a new technology to increase the accuracy of seismic exploration. We briefly discuss point source and receiver technology, analyze the high density spatial sampling in situ method, introduce the symmetric sampling principles presented by Gijs J. O. Vermeer, and discuss high density spatial sampling technology from the point of view of wave field continuity. We emphasize the analysis of the high density spatial sampling characteristics, including the high density first break advantages for investigation of near surface structure, improving static correction precision, the use of dense receiver spacing at short offsets to increase the effective coverage at shallow depth, and the accuracy of reflection imaging. Coherent noise is not aliased and the noise analysis precision and suppression increases as a result. High density spatial sampling enhances wave field continuity and the accuracy of various mathematical transforms, which benefits wave field separation. Finally, we point out that the difficult part of high density spatial sampling technology is the data processing. More research needs to be done on the methods of analyzing and processing huge amounts of seismic data.
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.
基金supported by the National Key Research Program(No.2018YFB1601105,No.2018YFB1601102)the Natural Science Foundation of China(No.41975165,No.U1811463)Chongqing Science and Technology Project(No.cstc2019jscxfxydX0035)。
文摘High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.
文摘The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.
基金Supported by National Natural Science Foundation of China(Grant Nos.51875031,52242507)Beijing Municipal Natural Science Foundation of China(Grant No.3212010)Beijing Municipal Youth Backbone Personal Project of China(Grant No.2017000020124 G018).
文摘The co-frequency vibration fault is one of the common faults in the operation of rotating equipment,and realizing the real-time diagnosis of the co-frequency vibration fault is of great significance for monitoring the health state and carrying out vibration suppression of the equipment.In engineering scenarios,co-frequency vibration faults are highlighted by rotational frequency and are difficult to identify,and existing intelligent methods require more hardware conditions and are exclusively time-consuming.Therefore,Lightweight-convolutional neural networks(LW-CNN)algorithm is proposed in this paper to achieve real-time fault diagnosis.The critical parameters are discussed and verified by simulated and experimental signals for the sliding window data augmentation method.Based on LW-CNN and data augmentation,the real-time intelligent diagnosis of co-frequency is realized.Moreover,a real-time detection method of fault diagnosis algorithm is proposed for data acquisition to fault diagnosis.It is verified by experiments that the LW-CNN and sliding window methods are used with high accuracy and real-time performance.
基金This work is supported by the National Natural Science Foundation of China(Grant No.51991392)Key Deployment Projects of Chinese Academy of Sciences(Grant No.ZDRW-ZS-2021-3-3)the Second Tibetan Plateau Scientific Expedition and Research Program(STEP)(Grant No.2019QZKK0904).
文摘Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.
基金supported by the National Natural Science Foundation of China [Grant agreement 31502124]the National Science and Technology Major Project of China [Grant agreement 2018ZX10733402]
文摘Objective Antimicrobial resistance(AMR)has become a global concern and is especially severe in China.To effectively and reliably provide AMR data,we developed a new high-throughput real-time PCR assay based on microfluidic dynamic technology,and screened multiple AMR genes in broiler fecal samples.Methods A high-throughput real-time PCR system with an new designed integrated fluidic circuit assay were performed AMR gene detection.A total of 273 broiler fecal samples collected from two geographically separated farms were screened AMR genes.Results The new assay with limits of detection ranging from 40.9 to 8,000 copies/reaction.The sensitivity rate,specificity rate,positive predictive value,negative predictive value and correct indices were 99.30%,98.08%,95.31%,99.79%,and 0.9755,respectively.Utilizing this assay,we demonstrate that AMR genes are widely spread,with positive detection rates ranging from 0 to 97.07%in 273 broiler fecal samples.bla CTX-M,bla TEM,mcr-1,fex A,cfr,optr A,and int I1 showed over 80%prevalence.The dissemination of AMR genes was distinct between the two farms.Conclusions We successfully established a new high-throughput real-time PCR assay applicable to AMR gene surveillance from fecal samples.The widespread existence of AMR genes detected in broiler farms highlights the current and severe problem of AMR.