This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on m...This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
The burgeoning use of Web 2.0-powered social media in recent years has inspired numerous studies on the content and composition of online social networks (OSNs). Many methods of harvesting useful information from soci...The burgeoning use of Web 2.0-powered social media in recent years has inspired numerous studies on the content and composition of online social networks (OSNs). Many methods of harvesting useful information from social networks’ immense amounts of user-generated data have been successfully applied to such real-world topics as politics and marketing, to name just a few. This study presents a novel twist on two popular techniques for studying OSNs: community detection and sentiment analysis. Using sentiment classification to enhance community detection and community partitions to permit more in-depth analysis of sentiment data, these two techniques are brought together to analyze four networks from the Twitter OSN. The Twitter networks used for this study are extracted from four accounts related to Microsoft Corporation, and together encompass more than 60,000 users and 2 million tweets collected over a period of 32 days. By combining community detection and sentiment analysis, modularity values were increased for the community partitions detected in three of the four networks studied. Furthermore, data collected during the community detection process enabled more granular, community-level sentiment analysis on a specific topic referenced by users in the dataset.展开更多
Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, pro...Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, producing two new algorithms called Double Sarsa and Double Expected Sarsa that are shown to be more robust than their single counterparts when rewards are stochastic. We find that these algorithms add a significant amount of stability in the learning process at only a minor computational cost, which leads to higher returns when using an on-policy algorithm. We then use shallow and deep neural networks to approximate the actionvalue, and show that Double Sarsa and Double Expected Sarsa are much more stable after convergence and can collect larger rewards than the single versions.展开更多
Cronbach’s Alpha coefficient is the most popular method of examining reliability. It is typically used when the researcher has several Likert-type items that are summed or averaged to make a composite score. Distribu...Cronbach’s Alpha coefficient is the most popular method of examining reliability. It is typically used when the researcher has several Likert-type items that are summed or averaged to make a composite score. Distribution of alpha coefficient has been subjected of many studies. In this study relationship between randomness and Cronbach alpha coefficient were investigated and in this context, present study was examined the question“What is the distribution of the coefficient alpha when a Likert-type scale is answered randomly?” Data were generated in the form of five point Likert-type items and Monte Carlosimulation was run for 5000 times for different item numbers.展开更多
Social media platforms such as Twitter and the Internet Movie Database (IMDb) contain a vast amount of data which have applications in predictive sentiment analysis for movie sales, stock market fluctuations, brand op...Social media platforms such as Twitter and the Internet Movie Database (IMDb) contain a vast amount of data which have applications in predictive sentiment analysis for movie sales, stock market fluctuations, brand opinion, or current events. Using a dataset taken from IMDb by Stanford, we identify some of the most significant phrases for identifying sentiment in a wide variety of movie reviews. Data from Twitter are especially attractive due to Twitter’s real-time nature through its streaming API. Effectively analyzing this data in a streaming fashion requires efficient models, which may be improved by reducing the dimensionality of input vectors. One way this has been done in the past is by using emoticons;we propose a method for further reducing these features through identifying common structure in emoticons with similar sentiment. We also examine the gender distribution of emoticon usage, finding tendencies towards certain emoticons to be disproportionate between males and females. Despite the roughly equal gender distribution on Twitter, emoticon usage is predominately female. Furthermore, we find that distributed vector representations, such as those produced by Word2Vec, may be reduced through feature selection. This analysis was done on a manually labeled sample of 1000 tweets from a new dataset, the Large Emoticon Corpus, which consisted of about 8.5 million tweets containing emoticons and was collecting over a five day period in May 2015. Additionally, using the common structure of similar emoticons, we are able to characterize positive and negative emoticons using two regular expressions which account for over 90% of emoticon usage in the Large Emoticon Corpus.展开更多
The measurements and analysis of deformation of engineering structures such as dams, bridges and high-rise buildings are important tasks for civil engineers. It is evident that, all civil engineering structures are su...The measurements and analysis of deformation of engineering structures such as dams, bridges and high-rise buildings are important tasks for civil engineers. It is evident that, all civil engineering structures are susceptible for deterioration over a period of time. Bridges in particular, deteriorate due to loading conditions, environmental changes, earth movement, material used during construction, age and corrosion of steel. Continuous monitoring of such structure is the most important aspect as it provides quantitative information, assesses the state of the structure, detects unsafe positions and proposes early safety measures to be taken before it can threaten the safety of vehicles, goods and human life. Despite government’s efforts to construct roads and highways, bridge deformation monitoring has not been given priority in most of African countries and ultimately causes some bridges to collapse unexpectedly. The purpose of this research is to integrate Global Positioning System (GPS) and Linear Variable Differential Transducers (LVDT) to monitor deformation of a bridge. The horizontal positions of reference and monitoring points were determined using Global Positioning System (GPS) while the vertical deflections, accelerations and strain were determined using Linear Variable Differential Transducers (LVDT). The maximum displacements obtained between zero and first epochs in x, y and z components were 0.798 m, at point LT08, 0.865 m at point BR13, and 0.56 m at point LT02 respectively. The maximum deflections for LVDT 1, 2 and 3 are 28.563 mm, 31.883 mm and 40.926 mm respectively. Finally, the correlation coefficient for the observations was 0.679 with standard deviations of 0.0168 and 0.0254 in x and y respectively. Our results identified some slight displacements in horizontal components at the bridge.展开更多
The rational layout of urban commercial space is conducive to optimizing the allocation of commercial resources in the urban interior space. Based on the commercial POI (Point of Interest) data in the central district...The rational layout of urban commercial space is conducive to optimizing the allocation of commercial resources in the urban interior space. Based on the commercial POI (Point of Interest) data in the central district of Mianyang, the characteristics of urban commercial spatial pattern under different scales are analyzed by using Kernel Density Estimation, Getis-Ord , Ripley’s K Function and Location Entropy method, and the spatial agglomeration characteristics of various industries in urban commerce are studied. The results show that: 1) The spatial distribution characteristics of commercial outlets in downtown Mianyang are remarkable, and show a multi-center distribution pattern. The hot area distribution of commercial outlets based on road grid unit is generally consistent with the identified commercial density center distribution. 2) The commercial grade scale structure has been formed in the central urban area as a whole, and the distribution of commercial network hot spots based on road grid unit is generally consistent with the identified commercial density center distribution. 3) From the perspective of commercial industry, the differentiation of urban commercial space “center-periphery” is obvious, and different industries show different spatial agglomeration modes. 4) The multi-scale spatial agglomeration of each industry is different, the spatial scale of location choice of comprehensive retail, household appliances and other industries is larger, and the scale of location choice of textile, clothing, culture and sports is small. 5) There are significant differences in specialized functional areas from the perspective of industry. Mature areas show multi-functional elements, multi-advantage industry agglomeration characteristics, and a small number of developing areas also show multi-advantage industry agglomeration characteristics.展开更多
Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “...Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “symptom phase”, “treatment phase”, and “commonly-used phrase” were set. Python web crawler was used to obtain relevant influenza data from the National Influenza Center’s influenza surveillance weekly report and Baidu Index. The establishment of support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), convolutional neural networks (CNN) prediction models through machine learning, took into account the seasonal characteristics of the influenza, also established the time series model (ARMA). The results show that, it is feasible to predict influenza based on web search data. Machine learning shows a certain forecast effect in the prediction of influenza based on web search data. In the future, it will have certain reference value in influenza prediction. The ARMA(3,0) model predicts better results and has greater generalization. Finally, the lack of research in this paper and future research directions are given.展开更多
The digital images have been studied for image classification, enhancement, image compression and image segmentation purposes. In the present work, it is proposed to study the effects of feature selection algorithm on...The digital images have been studied for image classification, enhancement, image compression and image segmentation purposes. In the present work, it is proposed to study the effects of feature selection algorithm on the predictive classification accuracy of algorithms used for discriminating the different plant leaf images. The process involves extracting the important texture features from the digital images and then subjecting them to feature selection and further classification process. The leaf image features have been extracted by using Gabor texture features and these Gabor features are subjected to Random Forest feature selection algorithm for extracting important texture features. The four classification algorithms like K-Nearest Neighbour, J48, Classification and Regression Trees and Random Forest have been used for classification purpose. This study shows that there is a net improvement in the predictive classification accuracy values, when classification algorithms have been applied on selected features over the complete set of features.展开更多
In view of the lack of patent big data in research on technology foresight in the industrial robot field, this paper introduces an improved method based on patent mining and knowledge map. Firstly, SAO structure is ex...In view of the lack of patent big data in research on technology foresight in the industrial robot field, this paper introduces an improved method based on patent mining and knowledge map. Firstly, SAO structure is extracted from selected patents, secondly, the similarity between patents is calculated based on extracted SAO structure, thirdly, patent network and patent map are drawn based on calculated patent similarity matrix, technology evolution process and future trends of industrial robot are summarized from patent network, and future potential technology opportunities are predicted based on technological vacancies identified from patent map. Finally, this paper identifies six key technical areas and four potential technical opportunities in the field of the industrial robot.展开更多
The ultimate aim of a smart city is to enhance the quality of life for its residents and businesses through modern technologies in order to reduce resource deterioration and maintain overall costs. From this perspecti...The ultimate aim of a smart city is to enhance the quality of life for its residents and businesses through modern technologies in order to reduce resource deterioration and maintain overall costs. From this perspective, blockchain is one of these technologies that has received much attention during the recent years as it offers new alternatives for individuals and institutions in the smart city context. This study aims to explore the potential and contribution of blockchain in smart cities by studying and reviewing the literature of scientific research on the concept and fundamentals of blockchain, involving its most practical applications. In addition, it summarizes worldwide examples of success in using blockchain as well as exploring the challenges and opportunities related to this technology in smart cities. Thus, this study provides a useful reference for researchers to review all about the new blockchain technology.展开更多
Some existed fuzzy regression methods have some special requirements for the object of study, such as assuming the observed values as symmetric triangular fuzzy numbers or imposing a non-negative constraint of regress...Some existed fuzzy regression methods have some special requirements for the object of study, such as assuming the observed values as symmetric triangular fuzzy numbers or imposing a non-negative constraint of regression parameters. In this paper, we propose a left-right fuzzy regression method, which is applicable to various forms of observed values. We present a fuzzy distance and partial order between two left-right (LR) fuzzy numbers and we let the mean fuzzy distance between the observed and estimated values as the mean fuzzy error, then make the mean fuzzy error minimum to get the regression parameter. We adopt two criteria involving mean fuzzy error (comparative mean fuzzy error based on partial order) and SSE to compare the performance of our proposed method with other methods. Finally four different types of numerical examples are given to illustrate that our proposed method has feasibility and wide applicability.展开更多
the world is experiencing a strong rush towards modern technology, while specialized companies are living a terrible rush in the information technology towards the so-called Internet of things IoT or Internet of objec...the world is experiencing a strong rush towards modern technology, while specialized companies are living a terrible rush in the information technology towards the so-called Internet of things IoT or Internet of objects,</span><span style="font-family:""> </span><span style="font-family:Verdana;">which is the integration of things with the world of Internet, by adding hardware or/and software to be smart and so be able to communicate with each other and participate effectively in all aspects of daily life,</span><span style="font-family:""> </span><span style="font-family:Verdana;">so enabling new forms of communication between people and things, and between things themselves, that’s will change the traditional life into a high style of living. But it won’t be easy, because there are still many challenges an</span><span style="font-family:Verdana;">d</span><span style="font-family:Verdana;"> issues that need to be addressed and have to be viewed from various aspects to realize </span><span style="font-family:Verdana;">their</span><span style="font-family:Verdana;"> full potential. The main objective of this review paper will provide the reader with a detailed discussion from a technological and social perspective. The various IoT challenges and issues, definition and architecture were discussed. Furthermore, a description of several sensors and actuators and </span><span style="font-family:Verdana;">their</span><span style="font-family:Verdana;"> smart communication. Also, the most important application areas of IoT were presented. This work will help readers and researchers understand the IoT and its potential application in the real world.展开更多
This research applies network structuring theories to the aviation domain and predicts aviation network growth, considering a flight connection between airports as a link between nodes. Our link prediction approach is...This research applies network structuring theories to the aviation domain and predicts aviation network growth, considering a flight connection between airports as a link between nodes. Our link prediction approach is based on network structure information, and to improve prediction accuracy, it is necessary to estimate the mechanism of aviation network growth. This research critically evaluates the prediction accuracy of two methods: the receiver operating characteristic curve method (ROC) and the logistic regression method. We propose a four-step method to evaluate the relative predictive accuracy among different link prediction methods. A case study of US aviation networks indicated that the ROC method provided better prediction accuracy compared with the logistic regression method. This result suggests that tuning of the prediction distribution and the regression model coefficients can further improve the accuracy of the logistic regression method.展开更多
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil...This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method.展开更多
To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using d...To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using data mining to perform such tasks. Data mining techniques are used to find hidden information from large data source. Data mining is using for various fields: Artificial intelligence, Bank, health and medical, corruption, legal issues, corporate business, marketing, etc. Special interest is given to associate rules, data mining algorithms, decision tree and distributed approach. Data is becoming larger and spreading geographically. So it is difficult to find better result from only a central data source. For knowledge discovery, we have to work with distributed database. On the other hand, security and privacy considerations are also another factor for de-motivation of working with centralized data. For this reason, distributed database is essential for future processing. In this paper, we have proposed a framework to study data mining in distributed environment. The paper presents a framework to bring out actionable knowledge. We have shown some level by which we can generate actionable knowledge. Possible tools and technique for these levels are discussed.展开更多
In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of lear...In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of learner. We conduct a comparative study for various ensemble methods (Bagging, Boosting, and Voting) with simple classifiers in perspective of classification. We also evaluate the effectiveness of these classifiers (both ensemble and simple) on five different data sets of varying session length. Presently the results of web server log analyzers are not very much reliable because the input log files are highly inflated by sessions of automated web traverse software’s, known as web robots. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract any actionable and usable knowledge about browsing behavior of actual visitors. So web robots sessions need accurate and fast detection from web server log repositories to extract knowledge about genuine visitors and to produce correct results of log analyzers.展开更多
The object of our present study is to develop a piecewise constant hazard model by using an Artificial Neural Network (ANN) to capture the complex shapes of the hazard functions, which cannot be achieved with conventi...The object of our present study is to develop a piecewise constant hazard model by using an Artificial Neural Network (ANN) to capture the complex shapes of the hazard functions, which cannot be achieved with conventional survival analysis models like Cox proportional hazard. We propose a more convenient approach to the PEANN created by Fornili et al. to handle a large amount of data. In particular, it provides much better prediction accuracies over both the Poisson regression and generalized estimating equations. This has been demonstrated with lung cancer patient data taken from the Surveillance, Epidemiology and End Results (SEER) program. The quality of the proposed model is evaluated by using several error measurement criteria.展开更多
We show a quantitative technique characterized by low numerical mediation for the reconstruction of temporal sequences of geophysical data of length L interrupted for a time ΔT where . The aim is to protect the infor...We show a quantitative technique characterized by low numerical mediation for the reconstruction of temporal sequences of geophysical data of length L interrupted for a time ΔT where . The aim is to protect the information acquired before and after the interruption by means of a numerical protocol with the lowest possible calculation weight. The signal reconstruction process is based on the synthesis of the low frequency signal extracted for subsampling (subsampling ∇Dirac = ΔT in phase with ΔT) with the high frequency signal recorded before the crash. The SYRec (SYnthetic REConstruction) method for simplicity and speed of calculation and for spectral response stability is particularly effective in the studies of high speed transient phenomena that develop in very perturbed fields. This operative condition is found a mental when almost immediate informational responses are required to the observation system. In this example we are dealing with geomagnetic data coming from an uw counter intrusion magnetic system. The system produces (on time) information about the transit of local magnetic singularities (magnetic perturbations with low spatial extension), originated by quasi-point form and kinematic sources (divers), in harbors magnetic underwater fields. The performances of stability of the SYRec system make it usable also in long and medium period of observation (activity of geomagnetic observatories).展开更多
文摘This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘The burgeoning use of Web 2.0-powered social media in recent years has inspired numerous studies on the content and composition of online social networks (OSNs). Many methods of harvesting useful information from social networks’ immense amounts of user-generated data have been successfully applied to such real-world topics as politics and marketing, to name just a few. This study presents a novel twist on two popular techniques for studying OSNs: community detection and sentiment analysis. Using sentiment classification to enhance community detection and community partitions to permit more in-depth analysis of sentiment data, these two techniques are brought together to analyze four networks from the Twitter OSN. The Twitter networks used for this study are extracted from four accounts related to Microsoft Corporation, and together encompass more than 60,000 users and 2 million tweets collected over a period of 32 days. By combining community detection and sentiment analysis, modularity values were increased for the community partitions detected in three of the four networks studied. Furthermore, data collected during the community detection process enabled more granular, community-level sentiment analysis on a specific topic referenced by users in the dataset.
文摘Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, producing two new algorithms called Double Sarsa and Double Expected Sarsa that are shown to be more robust than their single counterparts when rewards are stochastic. We find that these algorithms add a significant amount of stability in the learning process at only a minor computational cost, which leads to higher returns when using an on-policy algorithm. We then use shallow and deep neural networks to approximate the actionvalue, and show that Double Sarsa and Double Expected Sarsa are much more stable after convergence and can collect larger rewards than the single versions.
文摘Cronbach’s Alpha coefficient is the most popular method of examining reliability. It is typically used when the researcher has several Likert-type items that are summed or averaged to make a composite score. Distribution of alpha coefficient has been subjected of many studies. In this study relationship between randomness and Cronbach alpha coefficient were investigated and in this context, present study was examined the question“What is the distribution of the coefficient alpha when a Likert-type scale is answered randomly?” Data were generated in the form of five point Likert-type items and Monte Carlosimulation was run for 5000 times for different item numbers.
文摘Social media platforms such as Twitter and the Internet Movie Database (IMDb) contain a vast amount of data which have applications in predictive sentiment analysis for movie sales, stock market fluctuations, brand opinion, or current events. Using a dataset taken from IMDb by Stanford, we identify some of the most significant phrases for identifying sentiment in a wide variety of movie reviews. Data from Twitter are especially attractive due to Twitter’s real-time nature through its streaming API. Effectively analyzing this data in a streaming fashion requires efficient models, which may be improved by reducing the dimensionality of input vectors. One way this has been done in the past is by using emoticons;we propose a method for further reducing these features through identifying common structure in emoticons with similar sentiment. We also examine the gender distribution of emoticon usage, finding tendencies towards certain emoticons to be disproportionate between males and females. Despite the roughly equal gender distribution on Twitter, emoticon usage is predominately female. Furthermore, we find that distributed vector representations, such as those produced by Word2Vec, may be reduced through feature selection. This analysis was done on a manually labeled sample of 1000 tweets from a new dataset, the Large Emoticon Corpus, which consisted of about 8.5 million tweets containing emoticons and was collecting over a five day period in May 2015. Additionally, using the common structure of similar emoticons, we are able to characterize positive and negative emoticons using two regular expressions which account for over 90% of emoticon usage in the Large Emoticon Corpus.
文摘The measurements and analysis of deformation of engineering structures such as dams, bridges and high-rise buildings are important tasks for civil engineers. It is evident that, all civil engineering structures are susceptible for deterioration over a period of time. Bridges in particular, deteriorate due to loading conditions, environmental changes, earth movement, material used during construction, age and corrosion of steel. Continuous monitoring of such structure is the most important aspect as it provides quantitative information, assesses the state of the structure, detects unsafe positions and proposes early safety measures to be taken before it can threaten the safety of vehicles, goods and human life. Despite government’s efforts to construct roads and highways, bridge deformation monitoring has not been given priority in most of African countries and ultimately causes some bridges to collapse unexpectedly. The purpose of this research is to integrate Global Positioning System (GPS) and Linear Variable Differential Transducers (LVDT) to monitor deformation of a bridge. The horizontal positions of reference and monitoring points were determined using Global Positioning System (GPS) while the vertical deflections, accelerations and strain were determined using Linear Variable Differential Transducers (LVDT). The maximum displacements obtained between zero and first epochs in x, y and z components were 0.798 m, at point LT08, 0.865 m at point BR13, and 0.56 m at point LT02 respectively. The maximum deflections for LVDT 1, 2 and 3 are 28.563 mm, 31.883 mm and 40.926 mm respectively. Finally, the correlation coefficient for the observations was 0.679 with standard deviations of 0.0168 and 0.0254 in x and y respectively. Our results identified some slight displacements in horizontal components at the bridge.
文摘The rational layout of urban commercial space is conducive to optimizing the allocation of commercial resources in the urban interior space. Based on the commercial POI (Point of Interest) data in the central district of Mianyang, the characteristics of urban commercial spatial pattern under different scales are analyzed by using Kernel Density Estimation, Getis-Ord , Ripley’s K Function and Location Entropy method, and the spatial agglomeration characteristics of various industries in urban commerce are studied. The results show that: 1) The spatial distribution characteristics of commercial outlets in downtown Mianyang are remarkable, and show a multi-center distribution pattern. The hot area distribution of commercial outlets based on road grid unit is generally consistent with the identified commercial density center distribution. 2) The commercial grade scale structure has been formed in the central urban area as a whole, and the distribution of commercial network hot spots based on road grid unit is generally consistent with the identified commercial density center distribution. 3) From the perspective of commercial industry, the differentiation of urban commercial space “center-periphery” is obvious, and different industries show different spatial agglomeration modes. 4) The multi-scale spatial agglomeration of each industry is different, the spatial scale of location choice of comprehensive retail, household appliances and other industries is larger, and the scale of location choice of textile, clothing, culture and sports is small. 5) There are significant differences in specialized functional areas from the perspective of industry. Mature areas show multi-functional elements, multi-advantage industry agglomeration characteristics, and a small number of developing areas also show multi-advantage industry agglomeration characteristics.
文摘Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “symptom phase”, “treatment phase”, and “commonly-used phrase” were set. Python web crawler was used to obtain relevant influenza data from the National Influenza Center’s influenza surveillance weekly report and Baidu Index. The establishment of support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), convolutional neural networks (CNN) prediction models through machine learning, took into account the seasonal characteristics of the influenza, also established the time series model (ARMA). The results show that, it is feasible to predict influenza based on web search data. Machine learning shows a certain forecast effect in the prediction of influenza based on web search data. In the future, it will have certain reference value in influenza prediction. The ARMA(3,0) model predicts better results and has greater generalization. Finally, the lack of research in this paper and future research directions are given.
文摘The digital images have been studied for image classification, enhancement, image compression and image segmentation purposes. In the present work, it is proposed to study the effects of feature selection algorithm on the predictive classification accuracy of algorithms used for discriminating the different plant leaf images. The process involves extracting the important texture features from the digital images and then subjecting them to feature selection and further classification process. The leaf image features have been extracted by using Gabor texture features and these Gabor features are subjected to Random Forest feature selection algorithm for extracting important texture features. The four classification algorithms like K-Nearest Neighbour, J48, Classification and Regression Trees and Random Forest have been used for classification purpose. This study shows that there is a net improvement in the predictive classification accuracy values, when classification algorithms have been applied on selected features over the complete set of features.
文摘In view of the lack of patent big data in research on technology foresight in the industrial robot field, this paper introduces an improved method based on patent mining and knowledge map. Firstly, SAO structure is extracted from selected patents, secondly, the similarity between patents is calculated based on extracted SAO structure, thirdly, patent network and patent map are drawn based on calculated patent similarity matrix, technology evolution process and future trends of industrial robot are summarized from patent network, and future potential technology opportunities are predicted based on technological vacancies identified from patent map. Finally, this paper identifies six key technical areas and four potential technical opportunities in the field of the industrial robot.
文摘The ultimate aim of a smart city is to enhance the quality of life for its residents and businesses through modern technologies in order to reduce resource deterioration and maintain overall costs. From this perspective, blockchain is one of these technologies that has received much attention during the recent years as it offers new alternatives for individuals and institutions in the smart city context. This study aims to explore the potential and contribution of blockchain in smart cities by studying and reviewing the literature of scientific research on the concept and fundamentals of blockchain, involving its most practical applications. In addition, it summarizes worldwide examples of success in using blockchain as well as exploring the challenges and opportunities related to this technology in smart cities. Thus, this study provides a useful reference for researchers to review all about the new blockchain technology.
文摘Some existed fuzzy regression methods have some special requirements for the object of study, such as assuming the observed values as symmetric triangular fuzzy numbers or imposing a non-negative constraint of regression parameters. In this paper, we propose a left-right fuzzy regression method, which is applicable to various forms of observed values. We present a fuzzy distance and partial order between two left-right (LR) fuzzy numbers and we let the mean fuzzy distance between the observed and estimated values as the mean fuzzy error, then make the mean fuzzy error minimum to get the regression parameter. We adopt two criteria involving mean fuzzy error (comparative mean fuzzy error based on partial order) and SSE to compare the performance of our proposed method with other methods. Finally four different types of numerical examples are given to illustrate that our proposed method has feasibility and wide applicability.
文摘the world is experiencing a strong rush towards modern technology, while specialized companies are living a terrible rush in the information technology towards the so-called Internet of things IoT or Internet of objects,</span><span style="font-family:""> </span><span style="font-family:Verdana;">which is the integration of things with the world of Internet, by adding hardware or/and software to be smart and so be able to communicate with each other and participate effectively in all aspects of daily life,</span><span style="font-family:""> </span><span style="font-family:Verdana;">so enabling new forms of communication between people and things, and between things themselves, that’s will change the traditional life into a high style of living. But it won’t be easy, because there are still many challenges an</span><span style="font-family:Verdana;">d</span><span style="font-family:Verdana;"> issues that need to be addressed and have to be viewed from various aspects to realize </span><span style="font-family:Verdana;">their</span><span style="font-family:Verdana;"> full potential. The main objective of this review paper will provide the reader with a detailed discussion from a technological and social perspective. The various IoT challenges and issues, definition and architecture were discussed. Furthermore, a description of several sensors and actuators and </span><span style="font-family:Verdana;">their</span><span style="font-family:Verdana;"> smart communication. Also, the most important application areas of IoT were presented. This work will help readers and researchers understand the IoT and its potential application in the real world.
文摘This research applies network structuring theories to the aviation domain and predicts aviation network growth, considering a flight connection between airports as a link between nodes. Our link prediction approach is based on network structure information, and to improve prediction accuracy, it is necessary to estimate the mechanism of aviation network growth. This research critically evaluates the prediction accuracy of two methods: the receiver operating characteristic curve method (ROC) and the logistic regression method. We propose a four-step method to evaluate the relative predictive accuracy among different link prediction methods. A case study of US aviation networks indicated that the ROC method provided better prediction accuracy compared with the logistic regression method. This result suggests that tuning of the prediction distribution and the regression model coefficients can further improve the accuracy of the logistic regression method.
文摘This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method.
文摘To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using data mining to perform such tasks. Data mining techniques are used to find hidden information from large data source. Data mining is using for various fields: Artificial intelligence, Bank, health and medical, corruption, legal issues, corporate business, marketing, etc. Special interest is given to associate rules, data mining algorithms, decision tree and distributed approach. Data is becoming larger and spreading geographically. So it is difficult to find better result from only a central data source. For knowledge discovery, we have to work with distributed database. On the other hand, security and privacy considerations are also another factor for de-motivation of working with centralized data. For this reason, distributed database is essential for future processing. In this paper, we have proposed a framework to study data mining in distributed environment. The paper presents a framework to bring out actionable knowledge. We have shown some level by which we can generate actionable knowledge. Possible tools and technique for these levels are discussed.
文摘In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of learner. We conduct a comparative study for various ensemble methods (Bagging, Boosting, and Voting) with simple classifiers in perspective of classification. We also evaluate the effectiveness of these classifiers (both ensemble and simple) on five different data sets of varying session length. Presently the results of web server log analyzers are not very much reliable because the input log files are highly inflated by sessions of automated web traverse software’s, known as web robots. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract any actionable and usable knowledge about browsing behavior of actual visitors. So web robots sessions need accurate and fast detection from web server log repositories to extract knowledge about genuine visitors and to produce correct results of log analyzers.
文摘The object of our present study is to develop a piecewise constant hazard model by using an Artificial Neural Network (ANN) to capture the complex shapes of the hazard functions, which cannot be achieved with conventional survival analysis models like Cox proportional hazard. We propose a more convenient approach to the PEANN created by Fornili et al. to handle a large amount of data. In particular, it provides much better prediction accuracies over both the Poisson regression and generalized estimating equations. This has been demonstrated with lung cancer patient data taken from the Surveillance, Epidemiology and End Results (SEER) program. The quality of the proposed model is evaluated by using several error measurement criteria.
文摘We show a quantitative technique characterized by low numerical mediation for the reconstruction of temporal sequences of geophysical data of length L interrupted for a time ΔT where . The aim is to protect the information acquired before and after the interruption by means of a numerical protocol with the lowest possible calculation weight. The signal reconstruction process is based on the synthesis of the low frequency signal extracted for subsampling (subsampling ∇Dirac = ΔT in phase with ΔT) with the high frequency signal recorded before the crash. The SYRec (SYnthetic REConstruction) method for simplicity and speed of calculation and for spectral response stability is particularly effective in the studies of high speed transient phenomena that develop in very perturbed fields. This operative condition is found a mental when almost immediate informational responses are required to the observation system. In this example we are dealing with geomagnetic data coming from an uw counter intrusion magnetic system. The system produces (on time) information about the transit of local magnetic singularities (magnetic perturbations with low spatial extension), originated by quasi-point form and kinematic sources (divers), in harbors magnetic underwater fields. The performances of stability of the SYRec system make it usable also in long and medium period of observation (activity of geomagnetic observatories).