In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
This paper presents an experiment using OPENBCI to collect data of two hand gestures and decoding the signal to distinguish gestures. The signal was extracted with three electrodes on the subiect’s forearm and transf...This paper presents an experiment using OPENBCI to collect data of two hand gestures and decoding the signal to distinguish gestures. The signal was extracted with three electrodes on the subiect’s forearm and transferred in one channel. After utilizing a Butterworth bandpass filter, we chose a novel way to detect gesture action segment. Instead of using moving average algorithm, which is based on the calculation of energy, We developed an algorithm based on the Hilbert transform to find a dynamic threshold and identified the action segment. Four features have been extracted from each activity section, generating feature vectors for classification. During the process of classification, we made a comparison between K-nearest-neighbors (KNN) and support vector machine (SVM), based on a relatively small amount of samples. Most common experiments are based on a large quantity of data to pursue a highly fitted model. But there are certain circumstances where we cannot obtain enough training data, so it makes the exploration of best method to do classification under small sample data imperative. Though KNN is known for its simplicity and practicability, it is a relatively time-consuming method. On the other hand, SVM has a better performance in terms of time requirement and recognition accuracy, due to its application of different Risk Minimization Principle. Experimental results show an average recognition rate for the SVM algorithm that is 1.25% higher than for KNN while SVM is 2.031 s shorter than that KNN.展开更多
Recommender systems are very useful for people to explore what they really need.Academic papers are important achievements for researchers and they often have a great deal of choice to submit their papers.In order to ...Recommender systems are very useful for people to explore what they really need.Academic papers are important achievements for researchers and they often have a great deal of choice to submit their papers.In order to improve the efficiency of selecting the most suitable journals for publishing their works,journal recommender systems(JRS)can automatically provide a small number of candidate journals based on key information such as the title and the abstract.However,users or journal owners may attack the system for their own purposes.In this paper,we discuss about the adversarial attacks against content-based filtering JRS.We propose both targeted attack method that makes some target journals appear more often in the system and non-targeted attack method that makes the system provide incorrect recommendations.We also conduct extensive experiments to validate the proposed methods.We hope this paper could help improve JRS by realizing the existence of such adversarial attacks.展开更多
In wireless sensor networks, target classification differs from that in centralized sensing systems because of the distributed detection, wireless communication and limited resources. We study the classification probl...In wireless sensor networks, target classification differs from that in centralized sensing systems because of the distributed detection, wireless communication and limited resources. We study the classification problem of moving vehicles in wireless sensor networks using acoustic signals emitted from vehicles. Three algorithms including wavelet decomposition, weighted k-nearest-neighbor and Dempster-Shafer theory are combined in this paper. Finally, we use real world experimental data to validate the classification methods. The result shows that wavelet based feature extraction method can extract stable features from acoustic signals. By fusion with Dempster's rule, the classification performance is improved.展开更多
The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge...The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.展开更多
This paper investigates consensus of flocks consisting of n autonomous agents in the plane, where each agent has the same constant moving speed v and updates its heading by the average value of the kn nearest agents f...This paper investigates consensus of flocks consisting of n autonomous agents in the plane, where each agent has the same constant moving speed v and updates its heading by the average value of the kn nearest agents from it, with vn and kn being two prescribed parameters depending on n. Such a topological interaction rule is referred to as k,-nearest-neighbors rule, which has been validated for a class of birds by biologists and verified to be robust with respect to disturbances. A theoretical analysis will be presented for this flocking model under a random framework with large population, but without imposing any a priori connectivity assumptions. We will show that the minimum number of k~ needed for consensus is of the order O(log n) in a certain sense. To be precise, there exist two constants C1 〉 C2 〉 0 such that, if k 〉 C1 logn, then the flocking mode] will achieve consensus for any initial headings with high probability, provided that the speed vn is suitably small. On the other hand, if k 〈 Ca ]ogn, then for large n, with probability 1, there exist some initial headings such that consensus cannot be achieved, regardless of the value of Vn.展开更多
Online advertisements have a significant influence over the success or failure of your business.Therefore,it is important to somehow measure the impact of your advertisement before uploading it online,and this is can ...Online advertisements have a significant influence over the success or failure of your business.Therefore,it is important to somehow measure the impact of your advertisement before uploading it online,and this is can be done by calculating the Click Through Rate(CTR).Unfortunately,this method is not eco-friendly,since you have to gather the clicks from users then compute the CTR.This is where CTR prediction come in handy.Advertisement CTR prediction relies on the users’log regarding click information data.Accurate prediction of CTR is a challenging and critical process for e-advertising platforms these days.CTR prediction uses machine learning techniques to determine how much the online advertisement has been clicked by a potential client:The more clicks,the more successful the ad is.In this study we develop a machine learning based click through rate prediction model.The proposed study defines a model that generates accurate results with low computational power consumption.We used four classification techniques,namely K Nearest Neighbor(KNN),Logistic Regression,Random Forest,and Extreme Gradient Boosting(XGBoost).The study was performed on the Click-Through Rate Prediction Competition Dataset.It is a click-through data that is ordered chronologically and was collected over 10 days.Experimental results reveal that XGBoost produced ROC-AUC of 0.76 with reduced number of features.展开更多
This paper presents a vision-based fingertip-writing character recognition system. The overall system is implemented through a CMOS image camera on a FPGA chip. A blue cover is mounted on the top of a finger to simpli...This paper presents a vision-based fingertip-writing character recognition system. The overall system is implemented through a CMOS image camera on a FPGA chip. A blue cover is mounted on the top of a finger to simplify fingertip detection and to enhance recognition accuracy. For each character stroke, 8 sample points (including start and end points) are recorded. 7 tangent angles between consecutive sampled points are also recorded as features. In addition, 3 features angles are extracted: angles of the triangle consisting of the start point, end point and average point of all (8 total) sampled points. According to these key feature angles, a simple template matching K-nearest-neighbor classifier is applied to distinguish each character stroke. Experimental result showed that the system can successfully recognize fingertip-writing character strokes of digits and small lower case letter alphabets with an accuracy of almost 100%. Overall, the proposed finger-tip-writing recognition system provides an easy-to-use and accurate visual character input method.展开更多
A cognitive radio network(CRN)intelligently utilizes the available spectral resources by sensing and learning from the radio environment to maximize spectrum utilization.In CRNs,the secondary users(SUs)opportunistical...A cognitive radio network(CRN)intelligently utilizes the available spectral resources by sensing and learning from the radio environment to maximize spectrum utilization.In CRNs,the secondary users(SUs)opportunistically access the primary users(PUs)spectrum.Therefore,unambiguous detection of the PU channel occupancy is the most critical aspect of the operations of CRNs.Cooperative spectrum sensing(CSS)is rated as the best choice for making reliable sensing decisions.This paper employs machinelearning tools to sense the PU channels reliably in CSS.The sensing parameters are reconfigured to maximize the spectrum utilization while reducing sensing error and cost with improved channel throughput.The fine-k-nearest neighbor algorithm(FKNN),employed in this paper,estimates the number of samples based on the nature of the channel under-specific detection and false alarm probability demands.The simulation results reveal that the sensing cost is suppressed by reducing the sensing time and exploiting the traditional fusion rules,validating the effectiveness of the proposed scheme.Furthermore,the global decision made at the fusion center(FC)based on the modified sensing samples,results low energy consumption,higher throughput,and improved detection with low error probabilities.展开更多
Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering (GMM), k- nearest-neighbor (KNN), binary support vector machines (SVMs) and multiclass support vector mach...Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering (GMM), k- nearest-neighbor (KNN), binary support vector machines (SVMs) and multiclass support vector machines (MC-SVMs) were employed to classify the computer simulation data and two real microarray expression datasets. False positive, false negative, true positive, true negative, clustering accuracy and Matthews' correlation coefficient (MCC) were compared among these methods. The results are as follows: (1) In classifying thousands of gene expression data, the performances of two GMM methods have the maximal clustering accuracy and the least overall FP+FN error numbers on the basis of the assumption that the whole set of microarray data are a finite mixture of multivariate Gaussian distributions. Furthermore, when the number of training sample is very small, the clustering accuracy of GMM-Ⅱ method has superiority over GMM- Ⅰ method. (2) In general, the superior classification performance of the MC-SVMs are more robust and more practical, which are less sensitive to the curse of dimensionality, and not only next to GMM method in clustering accuracy to thousands of gene expression data, but also more robust to a small number of high-dimensional gene expression samples than other techniques. (3) Of the MC-SVMs, OVO and DAGSVM perform better on the large sample sizes, whereas five MC-SVMs methods have very similar performance on moderate sample sizes. In other cases, OVR, WW and CS yield better results when sample sizes are small. So, it is recommended that at least two candidate methods, choosing on the basis of the real data features and experimental conditions, should be performed and compared to obtain better clustering result.展开更多
During the pre-design stage of buildings,reliable long-term prediction of thermal loads is significant for cool-ing/heating system configuration and efficient operation.This paper proposes a surrogate modeling method ...During the pre-design stage of buildings,reliable long-term prediction of thermal loads is significant for cool-ing/heating system configuration and efficient operation.This paper proposes a surrogate modeling method to predict all-year hourly cooling/heating loads in high resolution for retail,hotel,and office buildings.16384 surrogate models are simulated in EnergyPlus to generate the load database,which contains 7 crucial building features as inputs and hourly loads as outputs.K-nearest-neighbors(KNN)is chosen as the data-driven algorithm to approximate the surrogates for load prediction.With test samples from the database,performances of five different spatial metrics for KNN are evaluated and optimized.Results show that the Manhattan distance is the optimal metric with the highest efficient hour rates of 93.57%and 97.14%for cooling and heating loads in office buildings.The method is verified by predicting the thermal loads of a given district in Shanghai,China.The mean absolute percentage errors(MAPE)are 5.26%and 6.88%for cooling/heating loads,respectively,and 5.63%for the annual thermal loads.The proposed surrogate modeling method meets the precision requirement of engineering in the building pre-design stage and achieves the fast prediction of all-year hourly thermal loads at the district level.As a data-driven approximation,it does not require as much detailed building information as the commonly used physics-based methods.And by pre-simulation of sufficient prototypical models,the method overcomes the gaps of data missing in current data-driven methods.展开更多
In this paper,the model reduction method based on κ-nearest-neighbors is provided for the parametrized nonlocal partial differential equations(PDEs).In comparison to standard local PDEs,the stiffness matrix of the co...In this paper,the model reduction method based on κ-nearest-neighbors is provided for the parametrized nonlocal partial differential equations(PDEs).In comparison to standard local PDEs,the stiffness matrix of the corresponding nonlocal model loses sparsity due to the nonlocal interaction parameter δ.Specially the nonlocal model contains uncertain parameters,enhancing the complexity of computation.In order to improve the computation efficiency,we combine the κ-nearest-neighbors with the model reduction method to construct the efficient surrogate models of the parametrized nonlocal problems.This method is an offline-online mechanism.In the offline phase,we develop the full-order model by using the quadratic finite element method(FEM)to generate snapshots and employ the model reduction method to process the snapshots and extract their key characters.In the online phase,we utilize κ-nearest-neighbors regression to construct the surrogate model.In the numerical experiments,we first verify the convergence rate when applying quadratic FEM to the nonlocal problems.Subsequently,for the linear and nonlinear nonlocal problems with random inputs,the numerical results illustrate the efficiency and accuracy of the surrogate models.展开更多
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘This paper presents an experiment using OPENBCI to collect data of two hand gestures and decoding the signal to distinguish gestures. The signal was extracted with three electrodes on the subiect’s forearm and transferred in one channel. After utilizing a Butterworth bandpass filter, we chose a novel way to detect gesture action segment. Instead of using moving average algorithm, which is based on the calculation of energy, We developed an algorithm based on the Hilbert transform to find a dynamic threshold and identified the action segment. Four features have been extracted from each activity section, generating feature vectors for classification. During the process of classification, we made a comparison between K-nearest-neighbors (KNN) and support vector machine (SVM), based on a relatively small amount of samples. Most common experiments are based on a large quantity of data to pursue a highly fitted model. But there are certain circumstances where we cannot obtain enough training data, so it makes the exploration of best method to do classification under small sample data imperative. Though KNN is known for its simplicity and practicability, it is a relatively time-consuming method. On the other hand, SVM has a better performance in terms of time requirement and recognition accuracy, due to its application of different Risk Minimization Principle. Experimental results show an average recognition rate for the SVM algorithm that is 1.25% higher than for KNN while SVM is 2.031 s shorter than that KNN.
基金This work is supported by the National Natural Science Foundation of China under Grant Nos.U1636215,61902082the Guangdong Key R&D Program of China 2019B010136003Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme(2019).
文摘Recommender systems are very useful for people to explore what they really need.Academic papers are important achievements for researchers and they often have a great deal of choice to submit their papers.In order to improve the efficiency of selecting the most suitable journals for publishing their works,journal recommender systems(JRS)can automatically provide a small number of candidate journals based on key information such as the title and the abstract.However,users or journal owners may attack the system for their own purposes.In this paper,we discuss about the adversarial attacks against content-based filtering JRS.We propose both targeted attack method that makes some target journals appear more often in the system and non-targeted attack method that makes the system provide incorrect recommendations.We also conduct extensive experiments to validate the proposed methods.We hope this paper could help improve JRS by realizing the existence of such adversarial attacks.
基金Supported in part by Science & Technology Department of Shanghai (05dz15004)
文摘In wireless sensor networks, target classification differs from that in centralized sensing systems because of the distributed detection, wireless communication and limited resources. We study the classification problem of moving vehicles in wireless sensor networks using acoustic signals emitted from vehicles. Three algorithms including wavelet decomposition, weighted k-nearest-neighbor and Dempster-Shafer theory are combined in this paper. Finally, we use real world experimental data to validate the classification methods. The result shows that wavelet based feature extraction method can extract stable features from acoustic signals. By fusion with Dempster's rule, the classification performance is improved.
文摘The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.
文摘This paper investigates consensus of flocks consisting of n autonomous agents in the plane, where each agent has the same constant moving speed v and updates its heading by the average value of the kn nearest agents from it, with vn and kn being two prescribed parameters depending on n. Such a topological interaction rule is referred to as k,-nearest-neighbors rule, which has been validated for a class of birds by biologists and verified to be robust with respect to disturbances. A theoretical analysis will be presented for this flocking model under a random framework with large population, but without imposing any a priori connectivity assumptions. We will show that the minimum number of k~ needed for consensus is of the order O(log n) in a certain sense. To be precise, there exist two constants C1 〉 C2 〉 0 such that, if k 〉 C1 logn, then the flocking mode] will achieve consensus for any initial headings with high probability, provided that the speed vn is suitably small. On the other hand, if k 〈 Ca ]ogn, then for large n, with probability 1, there exist some initial headings such that consensus cannot be achieved, regardless of the value of Vn.
文摘Online advertisements have a significant influence over the success or failure of your business.Therefore,it is important to somehow measure the impact of your advertisement before uploading it online,and this is can be done by calculating the Click Through Rate(CTR).Unfortunately,this method is not eco-friendly,since you have to gather the clicks from users then compute the CTR.This is where CTR prediction come in handy.Advertisement CTR prediction relies on the users’log regarding click information data.Accurate prediction of CTR is a challenging and critical process for e-advertising platforms these days.CTR prediction uses machine learning techniques to determine how much the online advertisement has been clicked by a potential client:The more clicks,the more successful the ad is.In this study we develop a machine learning based click through rate prediction model.The proposed study defines a model that generates accurate results with low computational power consumption.We used four classification techniques,namely K Nearest Neighbor(KNN),Logistic Regression,Random Forest,and Extreme Gradient Boosting(XGBoost).The study was performed on the Click-Through Rate Prediction Competition Dataset.It is a click-through data that is ordered chronologically and was collected over 10 days.Experimental results reveal that XGBoost produced ROC-AUC of 0.76 with reduced number of features.
文摘This paper presents a vision-based fingertip-writing character recognition system. The overall system is implemented through a CMOS image camera on a FPGA chip. A blue cover is mounted on the top of a finger to simplify fingertip detection and to enhance recognition accuracy. For each character stroke, 8 sample points (including start and end points) are recorded. 7 tangent angles between consecutive sampled points are also recorded as features. In addition, 3 features angles are extracted: angles of the triangle consisting of the start point, end point and average point of all (8 total) sampled points. According to these key feature angles, a simple template matching K-nearest-neighbor classifier is applied to distinguish each character stroke. Experimental result showed that the system can successfully recognize fingertip-writing character strokes of digits and small lower case letter alphabets with an accuracy of almost 100%. Overall, the proposed finger-tip-writing recognition system provides an easy-to-use and accurate visual character input method.
基金This work was supported in part by the Ministry of Science and ICT(MSIT),Korea,under the Information and Technology Research Center(ITRC)support program(IITP-2022-2018-0-01426)in part by the National Research Foundation of Korea(NRF)funded by theKorea government(MSIT)(No.2021R1A2C1013150).
文摘A cognitive radio network(CRN)intelligently utilizes the available spectral resources by sensing and learning from the radio environment to maximize spectrum utilization.In CRNs,the secondary users(SUs)opportunistically access the primary users(PUs)spectrum.Therefore,unambiguous detection of the PU channel occupancy is the most critical aspect of the operations of CRNs.Cooperative spectrum sensing(CSS)is rated as the best choice for making reliable sensing decisions.This paper employs machinelearning tools to sense the PU channels reliably in CSS.The sensing parameters are reconfigured to maximize the spectrum utilization while reducing sensing error and cost with improved channel throughput.The fine-k-nearest neighbor algorithm(FKNN),employed in this paper,estimates the number of samples based on the nature of the channel under-specific detection and false alarm probability demands.The simulation results reveal that the sensing cost is suppressed by reducing the sensing time and exploiting the traditional fusion rules,validating the effectiveness of the proposed scheme.Furthermore,the global decision made at the fusion center(FC)based on the modified sensing samples,results low energy consumption,higher throughput,and improved detection with low error probabilities.
基金This research was supported by the National Natural Science Foundation of China(30370758)Program for New Century Excellent Talents in Universities(NCET)of Ministry of Education to Dr.Xu Chenwu(NCET-05-0502).
文摘Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering (GMM), k- nearest-neighbor (KNN), binary support vector machines (SVMs) and multiclass support vector machines (MC-SVMs) were employed to classify the computer simulation data and two real microarray expression datasets. False positive, false negative, true positive, true negative, clustering accuracy and Matthews' correlation coefficient (MCC) were compared among these methods. The results are as follows: (1) In classifying thousands of gene expression data, the performances of two GMM methods have the maximal clustering accuracy and the least overall FP+FN error numbers on the basis of the assumption that the whole set of microarray data are a finite mixture of multivariate Gaussian distributions. Furthermore, when the number of training sample is very small, the clustering accuracy of GMM-Ⅱ method has superiority over GMM- Ⅰ method. (2) In general, the superior classification performance of the MC-SVMs are more robust and more practical, which are less sensitive to the curse of dimensionality, and not only next to GMM method in clustering accuracy to thousands of gene expression data, but also more robust to a small number of high-dimensional gene expression samples than other techniques. (3) Of the MC-SVMs, OVO and DAGSVM perform better on the large sample sizes, whereas five MC-SVMs methods have very similar performance on moderate sample sizes. In other cases, OVR, WW and CS yield better results when sample sizes are small. So, it is recommended that at least two candidate methods, choosing on the basis of the real data features and experimental conditions, should be performed and compared to obtain better clustering result.
基金This work was supported by the National Natural Science Foundation of China(Grant No.51978481).
文摘During the pre-design stage of buildings,reliable long-term prediction of thermal loads is significant for cool-ing/heating system configuration and efficient operation.This paper proposes a surrogate modeling method to predict all-year hourly cooling/heating loads in high resolution for retail,hotel,and office buildings.16384 surrogate models are simulated in EnergyPlus to generate the load database,which contains 7 crucial building features as inputs and hourly loads as outputs.K-nearest-neighbors(KNN)is chosen as the data-driven algorithm to approximate the surrogates for load prediction.With test samples from the database,performances of five different spatial metrics for KNN are evaluated and optimized.Results show that the Manhattan distance is the optimal metric with the highest efficient hour rates of 93.57%and 97.14%for cooling and heating loads in office buildings.The method is verified by predicting the thermal loads of a given district in Shanghai,China.The mean absolute percentage errors(MAPE)are 5.26%and 6.88%for cooling/heating loads,respectively,and 5.63%for the annual thermal loads.The proposed surrogate modeling method meets the precision requirement of engineering in the building pre-design stage and achieves the fast prediction of all-year hourly thermal loads at the district level.As a data-driven approximation,it does not require as much detailed building information as the commonly used physics-based methods.And by pre-simulation of sufficient prototypical models,the method overcomes the gaps of data missing in current data-driven methods.
基金supported by National Key R&D Program of China(No.2021YFA1001300)Natural Science Foundation of Hunan Province(Nos.2021JJ30084,2022JJ40030)+3 种基金National Natural Science Foundation of China(Nos.12271150,12101216)Hong Kong Research Grants Council grant 15303121the Hong Kong Polytechnic University Postdoctoral Research Fund 1-W261Guangdong Basic and Applied Basic Research Foundation(No.2024A1515012548).
文摘In this paper,the model reduction method based on κ-nearest-neighbors is provided for the parametrized nonlocal partial differential equations(PDEs).In comparison to standard local PDEs,the stiffness matrix of the corresponding nonlocal model loses sparsity due to the nonlocal interaction parameter δ.Specially the nonlocal model contains uncertain parameters,enhancing the complexity of computation.In order to improve the computation efficiency,we combine the κ-nearest-neighbors with the model reduction method to construct the efficient surrogate models of the parametrized nonlocal problems.This method is an offline-online mechanism.In the offline phase,we develop the full-order model by using the quadratic finite element method(FEM)to generate snapshots and employ the model reduction method to process the snapshots and extract their key characters.In the online phase,we utilize κ-nearest-neighbors regression to construct the surrogate model.In the numerical experiments,we first verify the convergence rate when applying quadratic FEM to the nonlocal problems.Subsequently,for the linear and nonlinear nonlocal problems with random inputs,the numerical results illustrate the efficiency and accuracy of the surrogate models.