During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and...During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.展开更多
The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth...The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth length is introduced. Through tested on lid driven cavity flow, it is clear that this method can provide high accuracy. Analysis and experiments have been made on its parallelism, and the results show that this method has better parallelism and with adding processors its accuracy become higher, thus it achieves that efficiency grows in pace with accuracy.展开更多
This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteris...This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
The complexity and unpredictability of clear air turbulence(CAT)pose significant challenges to aviation safety.Accurate prediction of turbulence events is crucial for reducing flight accidents and economic losses.Howe...The complexity and unpredictability of clear air turbulence(CAT)pose significant challenges to aviation safety.Accurate prediction of turbulence events is crucial for reducing flight accidents and economic losses.However,traditional turbulence prediction methods,such as ensemble forecasting techniques,have certain limitations:they only consider turbulence data from the most recent period,making it difficult to capture the nonlinear relationships present in turbulence.This study proposes a turbulence forecasting model based on the K-nearest neighbor(KNN)algorithm,which uses a combination of eight CAT diagnostic features as the feature vector and introduces CAT diagnostic feature weights to improve prediction accuracy.The model calculates the results of seven years of CAT diagnostics from 125 to 500 hPa obtained from the ECMWF fifth-generation reanalysis dataset(ERA5)as feature vector inputs and combines them with the labels of Pilot Reports(PIREP)annotated data,where each sample contributes to the prediction result.By measuring the distance between the current CAT diagnostic variable and other variables,the model determines the climatically most similar neighbors and identifies the turbulence intensity category caused by the current variable.To evaluate the model’s performance in diagnosing high-altitude turbulence over Colorado,PIREP cases were randomly selected for analysis.The results show that the weighted KNN(W-KNN)model exhibits higher skill in turbulence prediction,and outperforms traditional prediction methods and other machine learning models(e.g.,Random Forest)in capturing moderate or greater(MOG)level turbulence.The performance of the model was confirmed by evaluating the receiver operating characteristic(ROC)curve,maximum True Skill Statistic(maxTSS=0.552),and reliability plot.A robust score(area under the curve:AUC=0.86)was obtained,and the model demonstrated sensitivity to seasonal and annual climate fluctuations.展开更多
In this article,a new optimization system that uses few features to recognize locomotion with high classification accuracy is proposed.The optimization system consists of three parts.First,the features of the mixed me...In this article,a new optimization system that uses few features to recognize locomotion with high classification accuracy is proposed.The optimization system consists of three parts.First,the features of the mixed mechanical signal data are extracted from each analysis window of 200 ms after each foot contact event.Then,the Binary version of the hybrid Gray Wolf Optimization and Particle Swarm Optimization(BGWOPSO)algorithm is used to select features.And,the selected features are optimized and assigned different weights by the Biogeography-Based Optimization(BBO)algorithm.Finally,an improved K-Nearest Neighbor(KNN)classifier is employed for intention recognition.This classifier has the advantages of high accuracy,few parameters as well as low memory burden.Based on data from eight patients with transfemoral amputations,the optimization system is evaluated.The numerical results indicate that the proposed model can recognize nine daily locomotion modes(i.e.,low-,mid-,and fast-speed level-ground walking,ramp ascent/decent,stair ascent/descent,and sit/stand)by only seven features,with an accuracy of 96.66%±0.68%.As for real-time prediction on a powered knee prosthesis,the shortest prediction time is only 9.8 ms.These promising results reveal the potential of intention recognition based on the proposed system for high-level control of the prosthetic knee.展开更多
To address cross-ISP traffic problem caused by BitTorrent,we present our design and evaluation of a proximity-aware BitTorrent system. In our approach,clients generate global proximity-aware information by using landm...To address cross-ISP traffic problem caused by BitTorrent,we present our design and evaluation of a proximity-aware BitTorrent system. In our approach,clients generate global proximity-aware information by using landmark clustering;the tracker uses this proximity to maintain all peers in an orderly way and hands back a biased subset consisting of the peers who are physically closest to the requestor. Our approach requires no co-operation between P2P users and their Internet infra structures,such as ISPs or CDNs,no constantly path monitoring or probing their neighbors. The simulation results show that our approach can not only reduce unnecessary cross-ISP traffic,but also allow downloadsing fast.展开更多
A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates...A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates and the other contains the remaining coordinates. Three inequalities based on the characteristics of the sums and variances of a vector and its two subvectors were introduced to reject those codewords which are impossible to be the nearest codeword. The simulation results show that the proposed algorithm is faster than the improved equal average eaual variance nearest neighbor search (EENNS) algorithm.展开更多
The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effectiv...The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.展开更多
It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical informati...It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.展开更多
In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used t...In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.展开更多
Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computati...Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computation offers solutions to these two prominent issues quantum mechanically and beautifully. Through careful design to employ superposition, entanglement, and interference of quantum states, a quantum algorithm can allow a quantum computer to store datasets of exponentially large size as linear size and then process them in parallel. Quantum computing has found its way in the world of machine learning where new ideas and approaches are in great need as the classical computers have reached their capacity and the demand for processing big data grows much faster than the computing power the classical computers can provide today. Nearest neighbor algorithms are simple, robust, and versatile supervised machine learning algorithms, which store all training data points as their learned “model” and make the prediction of a new test data point by computing the distances between the query point and all the training data points. Quantum counterparts of these classical algorithms provide efficient and elegant ways to deal with the two major issues of storing data in memory and computing the distances. The purpose of our study is to select two similar quantum nearest neighbor algorithms and use a simple dataset to give insight into how they work, highlight their quantum nature, and compare their performances on IBM’s quantum simulator.展开更多
基金supported by the Innovative Research Groups of National Natural Science Foundation of China(No. 51621092)National Basic Research Program of China ("973" Program, No. 2013CB035904)National Natural Science Foundation of China (No. 51439005)
文摘During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.
基金Project supported by the National Natural Science Foundation of China(Grant No.11002086)the Shanghai Leading Academic Discipline Project(Grant No.J50103)
文摘The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth length is introduced. Through tested on lid driven cavity flow, it is clear that this method can provide high accuracy. Analysis and experiments have been made on its parallelism, and the results show that this method has better parallelism and with adding processors its accuracy become higher, thus it achieves that efficiency grows in pace with accuracy.
文摘This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
基金Supported by the Nanjing University of Aeronautics and Astronautics(KFB2305601).
文摘The complexity and unpredictability of clear air turbulence(CAT)pose significant challenges to aviation safety.Accurate prediction of turbulence events is crucial for reducing flight accidents and economic losses.However,traditional turbulence prediction methods,such as ensemble forecasting techniques,have certain limitations:they only consider turbulence data from the most recent period,making it difficult to capture the nonlinear relationships present in turbulence.This study proposes a turbulence forecasting model based on the K-nearest neighbor(KNN)algorithm,which uses a combination of eight CAT diagnostic features as the feature vector and introduces CAT diagnostic feature weights to improve prediction accuracy.The model calculates the results of seven years of CAT diagnostics from 125 to 500 hPa obtained from the ECMWF fifth-generation reanalysis dataset(ERA5)as feature vector inputs and combines them with the labels of Pilot Reports(PIREP)annotated data,where each sample contributes to the prediction result.By measuring the distance between the current CAT diagnostic variable and other variables,the model determines the climatically most similar neighbors and identifies the turbulence intensity category caused by the current variable.To evaluate the model’s performance in diagnosing high-altitude turbulence over Colorado,PIREP cases were randomly selected for analysis.The results show that the weighted KNN(W-KNN)model exhibits higher skill in turbulence prediction,and outperforms traditional prediction methods and other machine learning models(e.g.,Random Forest)in capturing moderate or greater(MOG)level turbulence.The performance of the model was confirmed by evaluating the receiver operating characteristic(ROC)curve,maximum True Skill Statistic(maxTSS=0.552),and reliability plot.A robust score(area under the curve:AUC=0.86)was obtained,and the model demonstrated sensitivity to seasonal and annual climate fluctuations.
基金This research was supported in part by the National Key Research and Development Program of China under Grant 2018YFC2001300in part by the National Natural Science Foundation of China under Grant 91948302,Grant 91848204,and Grant 52021003the Project of Scientific and Technological Development Plan of Jilin Province under Grant 20220508130RC.
文摘In this article,a new optimization system that uses few features to recognize locomotion with high classification accuracy is proposed.The optimization system consists of three parts.First,the features of the mixed mechanical signal data are extracted from each analysis window of 200 ms after each foot contact event.Then,the Binary version of the hybrid Gray Wolf Optimization and Particle Swarm Optimization(BGWOPSO)algorithm is used to select features.And,the selected features are optimized and assigned different weights by the Biogeography-Based Optimization(BBO)algorithm.Finally,an improved K-Nearest Neighbor(KNN)classifier is employed for intention recognition.This classifier has the advantages of high accuracy,few parameters as well as low memory burden.Based on data from eight patients with transfemoral amputations,the optimization system is evaluated.The numerical results indicate that the proposed model can recognize nine daily locomotion modes(i.e.,low-,mid-,and fast-speed level-ground walking,ramp ascent/decent,stair ascent/descent,and sit/stand)by only seven features,with an accuracy of 96.66%±0.68%.As for real-time prediction on a powered knee prosthesis,the shortest prediction time is only 9.8 ms.These promising results reveal the potential of intention recognition based on the proposed system for high-level control of the prosthetic knee.
基金supported in part by the National High-tech Research and Development Program (863 Program) of China under Grant No. 2009AA01Z210, No. 2009AA01Z250 and No. 2008AA01A324support from Guangdong Ministry of Education Industry-Academia-Research project No. 2009B090300315EU FP7 Project (INFSO-ICT- 215549)
文摘To address cross-ISP traffic problem caused by BitTorrent,we present our design and evaluation of a proximity-aware BitTorrent system. In our approach,clients generate global proximity-aware information by using landmark clustering;the tracker uses this proximity to maintain all peers in an orderly way and hands back a biased subset consisting of the peers who are physically closest to the requestor. Our approach requires no co-operation between P2P users and their Internet infra structures,such as ISPs or CDNs,no constantly path monitoring or probing their neighbors. The simulation results show that our approach can not only reduce unnecessary cross-ISP traffic,but also allow downloadsing fast.
文摘A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates and the other contains the remaining coordinates. Three inequalities based on the characteristics of the sums and variances of a vector and its two subvectors were introduced to reject those codewords which are impossible to be the nearest codeword. The simulation results show that the proposed algorithm is faster than the improved equal average eaual variance nearest neighbor search (EENNS) algorithm.
文摘The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.
基金Supported by Australian Research Council Discovery(DP130102691)the National Science Foundation of China(61302157)+1 种基金China National 863 Project(2012AA12A308)China Pre-research Project of Nuclear Industry(FZ1402-08)
文摘It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.
基金supported by the National Science Fund for Distinguished Young Scholars of China(61525304)the National Natural Science Foundation of China(61873328)
文摘In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.
文摘Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computation offers solutions to these two prominent issues quantum mechanically and beautifully. Through careful design to employ superposition, entanglement, and interference of quantum states, a quantum algorithm can allow a quantum computer to store datasets of exponentially large size as linear size and then process them in parallel. Quantum computing has found its way in the world of machine learning where new ideas and approaches are in great need as the classical computers have reached their capacity and the demand for processing big data grows much faster than the computing power the classical computers can provide today. Nearest neighbor algorithms are simple, robust, and versatile supervised machine learning algorithms, which store all training data points as their learned “model” and make the prediction of a new test data point by computing the distances between the query point and all the training data points. Quantum counterparts of these classical algorithms provide efficient and elegant ways to deal with the two major issues of storing data in memory and computing the distances. The purpose of our study is to select two similar quantum nearest neighbor algorithms and use a simple dataset to give insight into how they work, highlight their quantum nature, and compare their performances on IBM’s quantum simulator.