With the acceleration of intelligent transformation of energy system,the monitoring of equipment operation status and optimization of production process in thermal power plants face the challenge of multi-source heter...With the acceleration of intelligent transformation of energy system,the monitoring of equipment operation status and optimization of production process in thermal power plants face the challenge of multi-source heterogeneous data integration.In view of the heterogeneous characteristics of physical sensor data,including temperature,vibration and pressure that generated by boilers,steam turbines and other key equipment and real-time working condition data of SCADA system,this paper proposes a multi-source heterogeneous data fusion and analysis platform for thermal power plants based on edge computing and deep learning.By constructing a multi-level fusion architecture,the platform adopts dynamic weight allocation strategy and 5D digital twin model to realize the collaborative analysis of physical sensor data,simulation calculation results and expert knowledge.The data fusion module combines Kalman filter,wavelet transform and Bayesian estimation method to solve the problem of data time series alignment and dimension difference.Simulation results show that the data fusion accuracy can be improved to more than 98%,and the calculation delay can be controlled within 500 ms.The data analysis module integrates Dymola simulation model and AERMOD pollutant diffusion model,supports the cascade analysis of boiler combustion efficiency prediction and flue gas emission monitoring,system response time is less than 2 seconds,and data consistency verification accuracy reaches 99.5%.展开更多
Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable opera...Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.展开更多
Taking the Ming Tombs Forest Farm in Beijing as the research object,this research applied multi-source data fusion and GIS heat-map overlay analysis techniques,systematically collected bird observation point data from...Taking the Ming Tombs Forest Farm in Beijing as the research object,this research applied multi-source data fusion and GIS heat-map overlay analysis techniques,systematically collected bird observation point data from the Global Biodiversity Information Facility(GBIF),population distribution data from the Oak Ridge National Laboratory(ORNL)in the United States,as well as information on the composition of tree species in suitable forest areas for birds and the forest geographical information of the Ming Tombs Forest Farm,which is based on literature research and field investigations.By using GIS technology,spatial processing was carried out on bird observation points and population distribution data to identify suitable bird-watching areas in different seasons.Then,according to the suitability value range,these areas were classified into different grades(from unsuitable to highly suitable).The research findings indicated that there was significant spatial heterogeneity in the bird-watching suitability of the Ming Tombs Forest Farm.The north side of the reservoir was generally a core area with high suitability in all seasons.The deep-aged broad-leaved mixed forests supported the overlapping co-existence of the ecological niches of various bird species,such as the Zosterops simplex and Urocissa erythrorhyncha.In contrast,the shallow forest-edge coniferous pure forests and mixed forests were more suitable for specialized species like Carduelis sinica.The southern urban area and the core area of the mausoleums had relatively low suitability due to ecological fragmentation or human interference.Based on these results,this paper proposed a three-level protection framework of“core area conservation—buffer zone management—isolation zone construction”and a spatio-temporal coordinated human-bird co-existence strategy.It was also suggested that the human-bird co-existence space could be optimized through measures such as constructing sound and light buffer interfaces,restoring ecological corridors,and integrating cultural heritage elements.This research provided an operational technical approach and decision-making support for the scientific planning of bird-watching sites and the coordination of ecological protection and tourism development.展开更多
Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information mor...Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information more comprehensively than traditional methods using a single-view.How to use hashing to combine multi-view data for image retrieval is still a challenge.In this paper,a multi-view fusion hashing method based on RKCCA(Random Kernel Canonical Correlation Analysis)is proposed.In order to describe image content more accurately,we use deep learning dense convolutional network feature DenseNet to construct multi-view by combining GIST feature or BoW_SIFT(Bag-of-Words model+SIFT feature)feature.This algorithm uses RKCCA method to fuse multi-view features to construct association features and apply them to image retrieval.The algorithm generates binary hash code with minimal distortion error by designing quantization regularization terms.A large number of experiments on benchmark datasets show that this method is superior to other multi-view hashing methods.展开更多
Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve student...Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve students’grades and retention.Prediction of students’performance is a difficult process owing to the massive quantity of educational data.Therefore,Artificial Intelligence(AI)techniques can be used for educational data mining in a big data environ-ment.At the same time,in EDM,the feature selection process becomes necessary in creation of feature subsets.Since the feature selection performance affects the predictive performance of any model,it is important to elaborately investigate the outcome of students’performance model related to the feature selection techni-ques.With this motivation,this paper presents a new Metaheuristic Optimiza-tion-based Feature Subset Selection with an Optimal Deep Learning model(MOFSS-ODL)for predicting students’performance.In addition,the proposed model uses an isolation forest-based outlier detection approach to eliminate the existence of outliers.Besides,the Chaotic Monarch Butterfly Optimization Algo-rithm(CBOA)is used for the selection of highly related features with low com-plexity and high performance.Then,a sailfish optimizer with stacked sparse autoencoder(SFO-SSAE)approach is utilized for the classification of educational data.The MOFSS-ODL model is tested against a benchmark student’s perfor-mance data set from the UCI repository.A wide-ranging simulation analysis por-trayed the improved predictive performance of the MOFSS-ODL technique over recent approaches in terms of different measures.Compared to other methods,experimental results prove that the proposed(MOFSS-ODL)classification model does a great job of predicting students’academic progress,with an accuracy of 96.49%.展开更多
This paper presents a state of the art machine learning-based approach for automation of a varied class of Internet of things(Io T) analytics problems targeted on 1-dimensional(1-D) sensor data. As feature recommendat...This paper presents a state of the art machine learning-based approach for automation of a varied class of Internet of things(Io T) analytics problems targeted on 1-dimensional(1-D) sensor data. As feature recommendation is a major bottleneck for general Io Tbased applications, this paper shows how this step can be successfully automated based on a Wide Learning architecture without sacrificing the decision-making accuracy, and thereby reducing the development time and the cost of hiring expensive resources for specific problems. Interpretation of meaningful features is another contribution of this research. Several data sets from different real-world applications are considered to realize the proof-of-concept. Results show that the interpretable feature recommendation techniques are quite effective for the problems at hand in terms of performance and drastic reduction in development time.展开更多
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
Through analyzing the collected samples which are from three Chinese-English learners in ICLE project(Portsmouth Chinese-English learner corpus),this analysis project aims to describe the grammatical status of some no...Through analyzing the collected samples which are from three Chinese-English learners in ICLE project(Portsmouth Chinese-English learner corpus),this analysis project aims to describe the grammatical status of some non-native features in Chinese students’ writing and answer the following two questions:①Do these features seem to be performance mistakes(i.e.are they random) or is there evidence that they reflect an interlanguage grammar(ILG)(i.e.where they appear to be systematic errors)?②In the case of systematic errors,do they seem to be errors transferred from the L1(first language of the students) or do they seem to be developmental errors(shared by learners from other L1 backgrounds)?展开更多
With the rapid advancement of artificial intelligence,research on enabling computers to assist humans in achieving intelligent augmentation-thereby enhancing the accuracy and efficiency of information perception and p...With the rapid advancement of artificial intelligence,research on enabling computers to assist humans in achieving intelligent augmentation-thereby enhancing the accuracy and efficiency of information perception and processing-has been steadily evolving.Among these developments,innovations in human motion capture technology have been emerging rapidly,leading to an increasing diversity in motion capture data types.This diversity necessitates the establishment of a unified standard for multi-source data to facilitate effective analysis and comparison of their capability to represent human motion.Additionally,motion capture data often suffer from significant noise,acquisition delays,and asynchrony,making their effective processing and visualization a critical challenge.In this paper,we utilized data collected from a prototype of flexible fabric-based motion capture clothing and optical motion capture devices as inputs.Time synchronization and error analysis between the two data types were conducted,individual actions from continuous motion sequences were segmented,and the processed results were presented through a concise and intuitive visualization interface.Finally,we evaluated various system metrics,including the accuracy of time synchronization,data fitting error from fabric resistance to joint angles,precision of motion segmentation,and user feedback.展开更多
Conventional observation data,precipitation data from regional automatic stations,1°×1° NCEP reanalysis data and TBB pictures of FY-2C geostationary meteorological satellite as well as Doppler radar,etc...Conventional observation data,precipitation data from regional automatic stations,1°×1° NCEP reanalysis data and TBB pictures of FY-2C geostationary meteorological satellite as well as Doppler radar,etc.were utilized to analyzing the heavy precipitation process in Hunan Province from June 8 to 10.The results indicated that this heavy precipitation process was caused under the condition of western Pacific subtropical high jumped northward and fell southward rapidly,maintained and swung the shear line of low and middle-level atmosphere over long periods,and configurated temperature-moisture energy.Through analysis we found that precipitation period and precipitation area had a good corresponding to radar product and satellite TBB image,the high potential pseudo-equivalent temperature(θse) of low level and high convergence available potential energy(CAPE) area as well as ascending area of strong convergence.With the extension of effective forecasted period,the forecast location of T639 and EC on the western ridge points of western Pacific subtropical high became more and more easterly and the intensity became weaker and weaker,which had some deviations for forecasting heavy precipitation area.展开更多
The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to a...The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.展开更多
The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epi...The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.展开更多
As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs...As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.展开更多
The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for id...The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for identifying sentiment in OSNs such as communication pattern mining and classification based on emoticon and parts of speech, the majority of them utilize a suboptimal batch mode learning approach when analyzing a large amount of real time data. As an alternative we present a stream algorithm using Modified Balanced Winnow for sentiment analysis on OSNs. Tested on three real-world network datasets, the performance of our sentiment predictions is close to that of batch learning with the ability to detect important features dynamically for sentiment analysis in data streams. These top features reveal key words important to the analysis of sentiment.展开更多
This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based...This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based on the distribution of features in raw data. Modeling analysis proves that distortion caused by gridding can be greatly reduced when using such parameters. We also present some improved technical measures that use human- machine interaction and multi-thread parallel technology to solve inadequacies in traditional gridding software. On the basis of these methods, we have developed software that can be used to grid scattered data using a graphic interface. Finally, a comparison of different gridding parameters on field magnetic data from Ji Lin Province, North China demonstrates the superiority of the proposed method in eliminating the distortions and enhancing gridding efficiency.展开更多
Automatic road detection, in dense urban areas, is a challenging application in the remote sensing community. This is mainly because of physical and geometrical variations of road pixels, their spectral similarity to ...Automatic road detection, in dense urban areas, is a challenging application in the remote sensing community. This is mainly because of physical and geometrical variations of road pixels, their spectral similarity to other features such as buildings, parking lots and sidewalks, and the obstruction by vehicles and trees. These problems are real obstacles in precise detection and identification of urban roads from high-resolution satellite imagery. One of the promising strategies to deal with this problem is using multi-sensors data to reduce the uncertainties of detection. In this paper, an integrated object-based analysis framework was developed for detecting and extracting various types of urban roads from high-resolution optical images and Lidar data. The proposed method is designed and implemented using a rule-oriented approach based on a masking strategy. The overall accuracy (OA) of the final road map was 89.2%, and the kappa coefficient of agreement was 0.83, which show the efficiency and performance of the method in different conditions and interclass noises. The results also demonstrate the high capability of this object-based method in simultaneous identification of a wide variety of road elements in complex urban areas using both high-resolution satellite images and Lidar data.展开更多
This paper proposes a model to analyze the massive data of electricity.Feature subset is determined by the correla-tion-based feature selection and the data-driven methods.The attribute season can be classified succes...This paper proposes a model to analyze the massive data of electricity.Feature subset is determined by the correla-tion-based feature selection and the data-driven methods.The attribute season can be classified successfully through five classi-fiers using the selected feature subset,and the best model can be determined further.The effects on analyzing electricity consump-tion of the other three attributes,including months,businesses,and meters,can be estimated using the chosen model.The data used for the project is provided by Beijing Power Supply Bureau.We use WEKA as the machine learning tool.The models we built are promising for electricity scheduling and power theft detection.展开更多
文摘With the acceleration of intelligent transformation of energy system,the monitoring of equipment operation status and optimization of production process in thermal power plants face the challenge of multi-source heterogeneous data integration.In view of the heterogeneous characteristics of physical sensor data,including temperature,vibration and pressure that generated by boilers,steam turbines and other key equipment and real-time working condition data of SCADA system,this paper proposes a multi-source heterogeneous data fusion and analysis platform for thermal power plants based on edge computing and deep learning.By constructing a multi-level fusion architecture,the platform adopts dynamic weight allocation strategy and 5D digital twin model to realize the collaborative analysis of physical sensor data,simulation calculation results and expert knowledge.The data fusion module combines Kalman filter,wavelet transform and Bayesian estimation method to solve the problem of data time series alignment and dimension difference.Simulation results show that the data fusion accuracy can be improved to more than 98%,and the calculation delay can be controlled within 500 ms.The data analysis module integrates Dymola simulation model and AERMOD pollutant diffusion model,supports the cascade analysis of boiler combustion efficiency prediction and flue gas emission monitoring,system response time is less than 2 seconds,and data consistency verification accuracy reaches 99.5%.
文摘Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.
基金Sponsored by Beijing Youth Innovation Talent Support Program for Urban Greening and Landscaping——The 2024 Special Project for Promoting High-Quality Development of Beijing’s Landscaping through Scientific and Technological Innovation(KJCXQT202410).
文摘Taking the Ming Tombs Forest Farm in Beijing as the research object,this research applied multi-source data fusion and GIS heat-map overlay analysis techniques,systematically collected bird observation point data from the Global Biodiversity Information Facility(GBIF),population distribution data from the Oak Ridge National Laboratory(ORNL)in the United States,as well as information on the composition of tree species in suitable forest areas for birds and the forest geographical information of the Ming Tombs Forest Farm,which is based on literature research and field investigations.By using GIS technology,spatial processing was carried out on bird observation points and population distribution data to identify suitable bird-watching areas in different seasons.Then,according to the suitability value range,these areas were classified into different grades(from unsuitable to highly suitable).The research findings indicated that there was significant spatial heterogeneity in the bird-watching suitability of the Ming Tombs Forest Farm.The north side of the reservoir was generally a core area with high suitability in all seasons.The deep-aged broad-leaved mixed forests supported the overlapping co-existence of the ecological niches of various bird species,such as the Zosterops simplex and Urocissa erythrorhyncha.In contrast,the shallow forest-edge coniferous pure forests and mixed forests were more suitable for specialized species like Carduelis sinica.The southern urban area and the core area of the mausoleums had relatively low suitability due to ecological fragmentation or human interference.Based on these results,this paper proposed a three-level protection framework of“core area conservation—buffer zone management—isolation zone construction”and a spatio-temporal coordinated human-bird co-existence strategy.It was also suggested that the human-bird co-existence space could be optimized through measures such as constructing sound and light buffer interfaces,restoring ecological corridors,and integrating cultural heritage elements.This research provided an operational technical approach and decision-making support for the scientific planning of bird-watching sites and the coordination of ecological protection and tourism development.
基金This work is supported by the National Natural Science Foundation of China(No.61772561)the Key Research&Development Plan of Hunan Province(No.2018NK2012)+1 种基金the Science Research Projects of Hunan Provincial Education Department(Nos.18A174,18C0262)the Science&Technology Innovation Platform and Talent Plan of Hunan Province(2017TP1022).
文摘Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information more comprehensively than traditional methods using a single-view.How to use hashing to combine multi-view data for image retrieval is still a challenge.In this paper,a multi-view fusion hashing method based on RKCCA(Random Kernel Canonical Correlation Analysis)is proposed.In order to describe image content more accurately,we use deep learning dense convolutional network feature DenseNet to construct multi-view by combining GIST feature or BoW_SIFT(Bag-of-Words model+SIFT feature)feature.This algorithm uses RKCCA method to fuse multi-view features to construct association features and apply them to image retrieval.The algorithm generates binary hash code with minimal distortion error by designing quantization regularization terms.A large number of experiments on benchmark datasets show that this method is superior to other multi-view hashing methods.
文摘Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve students’grades and retention.Prediction of students’performance is a difficult process owing to the massive quantity of educational data.Therefore,Artificial Intelligence(AI)techniques can be used for educational data mining in a big data environ-ment.At the same time,in EDM,the feature selection process becomes necessary in creation of feature subsets.Since the feature selection performance affects the predictive performance of any model,it is important to elaborately investigate the outcome of students’performance model related to the feature selection techni-ques.With this motivation,this paper presents a new Metaheuristic Optimiza-tion-based Feature Subset Selection with an Optimal Deep Learning model(MOFSS-ODL)for predicting students’performance.In addition,the proposed model uses an isolation forest-based outlier detection approach to eliminate the existence of outliers.Besides,the Chaotic Monarch Butterfly Optimization Algo-rithm(CBOA)is used for the selection of highly related features with low com-plexity and high performance.Then,a sailfish optimizer with stacked sparse autoencoder(SFO-SSAE)approach is utilized for the classification of educational data.The MOFSS-ODL model is tested against a benchmark student’s perfor-mance data set from the UCI repository.A wide-ranging simulation analysis por-trayed the improved predictive performance of the MOFSS-ODL technique over recent approaches in terms of different measures.Compared to other methods,experimental results prove that the proposed(MOFSS-ODL)classification model does a great job of predicting students’academic progress,with an accuracy of 96.49%.
文摘This paper presents a state of the art machine learning-based approach for automation of a varied class of Internet of things(Io T) analytics problems targeted on 1-dimensional(1-D) sensor data. As feature recommendation is a major bottleneck for general Io Tbased applications, this paper shows how this step can be successfully automated based on a Wide Learning architecture without sacrificing the decision-making accuracy, and thereby reducing the development time and the cost of hiring expensive resources for specific problems. Interpretation of meaningful features is another contribution of this research. Several data sets from different real-world applications are considered to realize the proof-of-concept. Results show that the interpretable feature recommendation techniques are quite effective for the problems at hand in terms of performance and drastic reduction in development time.
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.
文摘Through analyzing the collected samples which are from three Chinese-English learners in ICLE project(Portsmouth Chinese-English learner corpus),this analysis project aims to describe the grammatical status of some non-native features in Chinese students’ writing and answer the following two questions:①Do these features seem to be performance mistakes(i.e.are they random) or is there evidence that they reflect an interlanguage grammar(ILG)(i.e.where they appear to be systematic errors)?②In the case of systematic errors,do they seem to be errors transferred from the L1(first language of the students) or do they seem to be developmental errors(shared by learners from other L1 backgrounds)?
基金supported by National Natural Science Foun-dation of China(62072383,61702433)the Public Technology Service Platform Project of Xiamen City(No.3502Z20231043)+2 种基金Xiaomi Young Talents Program/Xiaomi Foundation,the Funda-mental Research Funds for the Central Universities,Chinasupported by National Natural Science Founda-tion of China(62077039)the Fundamental Research Funds for the Central Universities,China(20720230106).
文摘With the rapid advancement of artificial intelligence,research on enabling computers to assist humans in achieving intelligent augmentation-thereby enhancing the accuracy and efficiency of information perception and processing-has been steadily evolving.Among these developments,innovations in human motion capture technology have been emerging rapidly,leading to an increasing diversity in motion capture data types.This diversity necessitates the establishment of a unified standard for multi-source data to facilitate effective analysis and comparison of their capability to represent human motion.Additionally,motion capture data often suffer from significant noise,acquisition delays,and asynchrony,making their effective processing and visualization a critical challenge.In this paper,we utilized data collected from a prototype of flexible fabric-based motion capture clothing and optical motion capture devices as inputs.Time synchronization and error analysis between the two data types were conducted,individual actions from continuous motion sequences were segmented,and the processed results were presented through a concise and intuitive visualization interface.Finally,we evaluated various system metrics,including the accuracy of time synchronization,data fitting error from fabric resistance to joint angles,precision of motion segmentation,and user feedback.
文摘Conventional observation data,precipitation data from regional automatic stations,1°×1° NCEP reanalysis data and TBB pictures of FY-2C geostationary meteorological satellite as well as Doppler radar,etc.were utilized to analyzing the heavy precipitation process in Hunan Province from June 8 to 10.The results indicated that this heavy precipitation process was caused under the condition of western Pacific subtropical high jumped northward and fell southward rapidly,maintained and swung the shear line of low and middle-level atmosphere over long periods,and configurated temperature-moisture energy.Through analysis we found that precipitation period and precipitation area had a good corresponding to radar product and satellite TBB image,the high potential pseudo-equivalent temperature(θse) of low level and high convergence available potential energy(CAPE) area as well as ascending area of strong convergence.With the extension of effective forecasted period,the forecast location of T639 and EC on the western ridge points of western Pacific subtropical high became more and more easterly and the intensity became weaker and weaker,which had some deviations for forecasting heavy precipitation area.
基金the National Social Science Foundation of China(Grant No.22BTJ035).
文摘The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.
基金Key discipline construction project for traditional Chinese Medicine in Guangdong province,Grant/Award Number:20220104The construction project of inheritance studio of national famous and old traditional Chinese Medicine experts,Grant/Award Number:140000020132。
文摘The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.
基金This research is funded by Fayoum University,Egypt.
文摘As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.
文摘The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for identifying sentiment in OSNs such as communication pattern mining and classification based on emoticon and parts of speech, the majority of them utilize a suboptimal batch mode learning approach when analyzing a large amount of real time data. As an alternative we present a stream algorithm using Modified Balanced Winnow for sentiment analysis on OSNs. Tested on three real-world network datasets, the performance of our sentiment predictions is close to that of batch learning with the ability to detect important features dynamically for sentiment analysis in data streams. These top features reveal key words important to the analysis of sentiment.
基金partly supported by the Public Geological Survey Project(No.201011039)the National High Technology Research and Development Project of China(No.2007AA06Z134)the 111 Project under the Ministry of Education and the State Administration of Foreign Experts Affairs,China(No.B07011)
文摘This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based on the distribution of features in raw data. Modeling analysis proves that distortion caused by gridding can be greatly reduced when using such parameters. We also present some improved technical measures that use human- machine interaction and multi-thread parallel technology to solve inadequacies in traditional gridding software. On the basis of these methods, we have developed software that can be used to grid scattered data using a graphic interface. Finally, a comparison of different gridding parameters on field magnetic data from Ji Lin Province, North China demonstrates the superiority of the proposed method in eliminating the distortions and enhancing gridding efficiency.
文摘Automatic road detection, in dense urban areas, is a challenging application in the remote sensing community. This is mainly because of physical and geometrical variations of road pixels, their spectral similarity to other features such as buildings, parking lots and sidewalks, and the obstruction by vehicles and trees. These problems are real obstacles in precise detection and identification of urban roads from high-resolution satellite imagery. One of the promising strategies to deal with this problem is using multi-sensors data to reduce the uncertainties of detection. In this paper, an integrated object-based analysis framework was developed for detecting and extracting various types of urban roads from high-resolution optical images and Lidar data. The proposed method is designed and implemented using a rule-oriented approach based on a masking strategy. The overall accuracy (OA) of the final road map was 89.2%, and the kappa coefficient of agreement was 0.83, which show the efficiency and performance of the method in different conditions and interclass noises. The results also demonstrate the high capability of this object-based method in simultaneous identification of a wide variety of road elements in complex urban areas using both high-resolution satellite images and Lidar data.
基金Supported by the National Earthquake Major Project of China (201008007)the Fundamental Research Funds for Central University of China (216275645)
文摘This paper proposes a model to analyze the massive data of electricity.Feature subset is determined by the correla-tion-based feature selection and the data-driven methods.The attribute season can be classified successfully through five classi-fiers using the selected feature subset,and the best model can be determined further.The effects on analyzing electricity consump-tion of the other three attributes,including months,businesses,and meters,can be estimated using the chosen model.The data used for the project is provided by Beijing Power Supply Bureau.We use WEKA as the machine learning tool.The models we built are promising for electricity scheduling and power theft detection.