The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth...The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth length is introduced. Through tested on lid driven cavity flow, it is clear that this method can provide high accuracy. Analysis and experiments have been made on its parallelism, and the results show that this method has better parallelism and with adding processors its accuracy become higher, thus it achieves that efficiency grows in pace with accuracy.展开更多
During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and...During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteris...This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.展开更多
Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically ...Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically modeling the actual scene, so that the hand-held full-band spectrum analyzer would be able to collect signal field strength values for indoor complex scenes. An improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression was proposed to predict the signal field strengths for the whole plane before and after being shield. Then the highest accuracy set of data could be picked out by comparison. The experimental results show that the improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression can scientifically and objectively predict the indoor complex scenes’ signal strength and evaluate the interference protection with high accuracy.展开更多
In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used t...In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.展开更多
The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effectiv...The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.展开更多
To address cross-ISP traffic problem caused by BitTorrent,we present our design and evaluation of a proximity-aware BitTorrent system. In our approach,clients generate global proximity-aware information by using landm...To address cross-ISP traffic problem caused by BitTorrent,we present our design and evaluation of a proximity-aware BitTorrent system. In our approach,clients generate global proximity-aware information by using landmark clustering;the tracker uses this proximity to maintain all peers in an orderly way and hands back a biased subset consisting of the peers who are physically closest to the requestor. Our approach requires no co-operation between P2P users and their Internet infra structures,such as ISPs or CDNs,no constantly path monitoring or probing their neighbors. The simulation results show that our approach can not only reduce unnecessary cross-ISP traffic,but also allow downloadsing fast.展开更多
A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates...A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates and the other contains the remaining coordinates. Three inequalities based on the characteristics of the sums and variances of a vector and its two subvectors were introduced to reject those codewords which are impossible to be the nearest codeword. The simulation results show that the proposed algorithm is faster than the improved equal average eaual variance nearest neighbor search (EENNS) algorithm.展开更多
Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in term...Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in terms of simple calculation and high computational accuracy,but its convergence speed is slow and it is easy to fall into the local optimal solution.In order to overcome the shortcomings,this paper integrates adaptive neighborhood and hybrid mutation strategies into whale optimization algorithms,designs the average distance from itself to other whales as an adaptive neighborhood radius,and chooses to learn from the optimal solution in the neighborhood instead of random learning strategies.The hybrid mutation strategy is used to enhance the ability of algorithm to jump out of the local optimal solution.A new whale optimization algorithm(HMNWOA)is proposed.The proposed algorithm inherits the global search capability of the original algorithm,enhances the exploitation ability,improves the quality of the population,and thus improves the convergence speed of the algorithm.A feature selection algorithm based on binary HMNWOA is proposed.Twelve standard datasets from UCI repository test the validity of the proposed algorithm for feature selection.The experimental results show that HMNWOA is very competitive compared to the other six popular feature selection methods in improving the classification accuracy and reducing the number of features,and ensures that HMNWOA has strong search ability in the search feature space.展开更多
It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical informati...It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computati...Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computation offers solutions to these two prominent issues quantum mechanically and beautifully. Through careful design to employ superposition, entanglement, and interference of quantum states, a quantum algorithm can allow a quantum computer to store datasets of exponentially large size as linear size and then process them in parallel. Quantum computing has found its way in the world of machine learning where new ideas and approaches are in great need as the classical computers have reached their capacity and the demand for processing big data grows much faster than the computing power the classical computers can provide today. Nearest neighbor algorithms are simple, robust, and versatile supervised machine learning algorithms, which store all training data points as their learned “model” and make the prediction of a new test data point by computing the distances between the query point and all the training data points. Quantum counterparts of these classical algorithms provide efficient and elegant ways to deal with the two major issues of storing data in memory and computing the distances. The purpose of our study is to select two similar quantum nearest neighbor algorithms and use a simple dataset to give insight into how they work, highlight their quantum nature, and compare their performances on IBM’s quantum simulator.展开更多
In this article,a new optimization system that uses few features to recognize locomotion with high classification accuracy is proposed.The optimization system consists of three parts.First,the features of the mixed me...In this article,a new optimization system that uses few features to recognize locomotion with high classification accuracy is proposed.The optimization system consists of three parts.First,the features of the mixed mechanical signal data are extracted from each analysis window of 200 ms after each foot contact event.Then,the Binary version of the hybrid Gray Wolf Optimization and Particle Swarm Optimization(BGWOPSO)algorithm is used to select features.And,the selected features are optimized and assigned different weights by the Biogeography-Based Optimization(BBO)algorithm.Finally,an improved K-Nearest Neighbor(KNN)classifier is employed for intention recognition.This classifier has the advantages of high accuracy,few parameters as well as low memory burden.Based on data from eight patients with transfemoral amputations,the optimization system is evaluated.The numerical results indicate that the proposed model can recognize nine daily locomotion modes(i.e.,low-,mid-,and fast-speed level-ground walking,ramp ascent/decent,stair ascent/descent,and sit/stand)by only seven features,with an accuracy of 96.66%±0.68%.As for real-time prediction on a powered knee prosthesis,the shortest prediction time is only 9.8 ms.These promising results reveal the potential of intention recognition based on the proposed system for high-level control of the prosthetic knee.展开更多
University dropout is a growing problem which, in recent years, is using computer techniques to assist in the detection process. The paper presents the evaluation of some prediction algorithms to detect a student with...University dropout is a growing problem which, in recent years, is using computer techniques to assist in the detection process. The paper presents the evaluation of some prediction algorithms to detect a student with a high possibility of scholar desertion. The approach uses real data from past scholar periods to create a dataset with different information of the students (i.e., personal, economic, and academic records). The algorithms selected in the experimental phase were: J48 decision tree, K-near neighbors, and support vector machine. We use two similarity metrics to split the dataset with cases with at least 80% of similarity to evaluate each case. We use the data from 2010 to 2016 with real students’ information to predict if there exists the possibility of a real academic dropout in one test for a period. The results show that the J48 algorithm reaches a better performance in both experiments. Besides, the tree generated for each student is taken as a path of attention, reaching around 88% of effectiveness. Finally, the conclusions argue the contributions of the paper and propose a future line of research.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant No.11002086)the Shanghai Leading Academic Discipline Project(Grant No.J50103)
文摘The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth length is introduced. Through tested on lid driven cavity flow, it is clear that this method can provide high accuracy. Analysis and experiments have been made on its parallelism, and the results show that this method has better parallelism and with adding processors its accuracy become higher, thus it achieves that efficiency grows in pace with accuracy.
基金supported by the Innovative Research Groups of National Natural Science Foundation of China(No. 51621092)National Basic Research Program of China ("973" Program, No. 2013CB035904)National Natural Science Foundation of China (No. 51439005)
文摘During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.
基金the National Natural Science Foundation of China under projects 61772150 and 61862012the Guangxi Key R&D Program under project AB17195025+5 种基金the Guangxi Natural Science Foundation under grants 2018GXNSFDA281054 and 2018GXNSFAA281232the National Cryptography Development Fund of China under project MMJJ20170217the Guangxi Science and Technology Base and Special Talents Program AD18281044the Innovation Project of GUET Graduate Education under project 2017YJCX46the Guangxi Young Teachers’ Basic Ability Improvement Program under Grant 2018KY0194the open program of Guangxi Key Laboratory of Cryptography and Information Security under projects GCIS201621 and GCIS201702.
文摘Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically modeling the actual scene, so that the hand-held full-band spectrum analyzer would be able to collect signal field strength values for indoor complex scenes. An improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression was proposed to predict the signal field strengths for the whole plane before and after being shield. Then the highest accuracy set of data could be picked out by comparison. The experimental results show that the improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression can scientifically and objectively predict the indoor complex scenes’ signal strength and evaluate the interference protection with high accuracy.
基金supported by the National Science Fund for Distinguished Young Scholars of China(61525304)the National Natural Science Foundation of China(61873328)
文摘In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.
文摘The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.
基金supported in part by the National High-tech Research and Development Program (863 Program) of China under Grant No. 2009AA01Z210, No. 2009AA01Z250 and No. 2008AA01A324support from Guangdong Ministry of Education Industry-Academia-Research project No. 2009B090300315EU FP7 Project (INFSO-ICT- 215549)
文摘To address cross-ISP traffic problem caused by BitTorrent,we present our design and evaluation of a proximity-aware BitTorrent system. In our approach,clients generate global proximity-aware information by using landmark clustering;the tracker uses this proximity to maintain all peers in an orderly way and hands back a biased subset consisting of the peers who are physically closest to the requestor. Our approach requires no co-operation between P2P users and their Internet infra structures,such as ISPs or CDNs,no constantly path monitoring or probing their neighbors. The simulation results show that our approach can not only reduce unnecessary cross-ISP traffic,but also allow downloadsing fast.
文摘A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates and the other contains the remaining coordinates. Three inequalities based on the characteristics of the sums and variances of a vector and its two subvectors were introduced to reject those codewords which are impossible to be the nearest codeword. The simulation results show that the proposed algorithm is faster than the improved equal average eaual variance nearest neighbor search (EENNS) algorithm.
基金This work was supported by the National Natural Science Foundation of China(Grant No.2017YFC0403605 and No.11601419).
文摘Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in terms of simple calculation and high computational accuracy,but its convergence speed is slow and it is easy to fall into the local optimal solution.In order to overcome the shortcomings,this paper integrates adaptive neighborhood and hybrid mutation strategies into whale optimization algorithms,designs the average distance from itself to other whales as an adaptive neighborhood radius,and chooses to learn from the optimal solution in the neighborhood instead of random learning strategies.The hybrid mutation strategy is used to enhance the ability of algorithm to jump out of the local optimal solution.A new whale optimization algorithm(HMNWOA)is proposed.The proposed algorithm inherits the global search capability of the original algorithm,enhances the exploitation ability,improves the quality of the population,and thus improves the convergence speed of the algorithm.A feature selection algorithm based on binary HMNWOA is proposed.Twelve standard datasets from UCI repository test the validity of the proposed algorithm for feature selection.The experimental results show that HMNWOA is very competitive compared to the other six popular feature selection methods in improving the classification accuracy and reducing the number of features,and ensures that HMNWOA has strong search ability in the search feature space.
基金Supported by Australian Research Council Discovery(DP130102691)the National Science Foundation of China(61302157)+1 种基金China National 863 Project(2012AA12A308)China Pre-research Project of Nuclear Industry(FZ1402-08)
文摘It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
文摘Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computation offers solutions to these two prominent issues quantum mechanically and beautifully. Through careful design to employ superposition, entanglement, and interference of quantum states, a quantum algorithm can allow a quantum computer to store datasets of exponentially large size as linear size and then process them in parallel. Quantum computing has found its way in the world of machine learning where new ideas and approaches are in great need as the classical computers have reached their capacity and the demand for processing big data grows much faster than the computing power the classical computers can provide today. Nearest neighbor algorithms are simple, robust, and versatile supervised machine learning algorithms, which store all training data points as their learned “model” and make the prediction of a new test data point by computing the distances between the query point and all the training data points. Quantum counterparts of these classical algorithms provide efficient and elegant ways to deal with the two major issues of storing data in memory and computing the distances. The purpose of our study is to select two similar quantum nearest neighbor algorithms and use a simple dataset to give insight into how they work, highlight their quantum nature, and compare their performances on IBM’s quantum simulator.
基金This research was supported in part by the National Key Research and Development Program of China under Grant 2018YFC2001300in part by the National Natural Science Foundation of China under Grant 91948302,Grant 91848204,and Grant 52021003the Project of Scientific and Technological Development Plan of Jilin Province under Grant 20220508130RC.
文摘In this article,a new optimization system that uses few features to recognize locomotion with high classification accuracy is proposed.The optimization system consists of three parts.First,the features of the mixed mechanical signal data are extracted from each analysis window of 200 ms after each foot contact event.Then,the Binary version of the hybrid Gray Wolf Optimization and Particle Swarm Optimization(BGWOPSO)algorithm is used to select features.And,the selected features are optimized and assigned different weights by the Biogeography-Based Optimization(BBO)algorithm.Finally,an improved K-Nearest Neighbor(KNN)classifier is employed for intention recognition.This classifier has the advantages of high accuracy,few parameters as well as low memory burden.Based on data from eight patients with transfemoral amputations,the optimization system is evaluated.The numerical results indicate that the proposed model can recognize nine daily locomotion modes(i.e.,low-,mid-,and fast-speed level-ground walking,ramp ascent/decent,stair ascent/descent,and sit/stand)by only seven features,with an accuracy of 96.66%±0.68%.As for real-time prediction on a powered knee prosthesis,the shortest prediction time is only 9.8 ms.These promising results reveal the potential of intention recognition based on the proposed system for high-level control of the prosthetic knee.
文摘University dropout is a growing problem which, in recent years, is using computer techniques to assist in the detection process. The paper presents the evaluation of some prediction algorithms to detect a student with a high possibility of scholar desertion. The approach uses real data from past scholar periods to create a dataset with different information of the students (i.e., personal, economic, and academic records). The algorithms selected in the experimental phase were: J48 decision tree, K-near neighbors, and support vector machine. We use two similarity metrics to split the dataset with cases with at least 80% of similarity to evaluate each case. We use the data from 2010 to 2016 with real students’ information to predict if there exists the possibility of a real academic dropout in one test for a period. The results show that the J48 algorithm reaches a better performance in both experiments. Besides, the tree generated for each student is taken as a path of attention, reaching around 88% of effectiveness. Finally, the conclusions argue the contributions of the paper and propose a future line of research.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.