Influenza,an acute respiratory infectious disease caused by the influenza virus,exhibits distinct seasonal patterns in China,with peak activity occurring in winter and spring in northern regions,and in winter and summ...Influenza,an acute respiratory infectious disease caused by the influenza virus,exhibits distinct seasonal patterns in China,with peak activity occurring in winter and spring in northern regions,and in winter and summer in southern areas[1].The World Health Organization(WHO)emphasizes that early warning and epidemic intensity assessments are critical public health strategies for influenza prevention and control.Internet-based flu surveillance,with real-time data and low costs,effectively complements traditional methods.The Baidu Search Index,which reflects flu-related queries,strongly correlates with influenza trends,aiding in regional activity assessment and outbreak tracking[2].展开更多
Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “...Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “symptom phase”, “treatment phase”, and “commonly-used phrase” were set. Python web crawler was used to obtain relevant influenza data from the National Influenza Center’s influenza surveillance weekly report and Baidu Index. The establishment of support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), convolutional neural networks (CNN) prediction models through machine learning, took into account the seasonal characteristics of the influenza, also established the time series model (ARMA). The results show that, it is feasible to predict influenza based on web search data. Machine learning shows a certain forecast effect in the prediction of influenza based on web search data. In the future, it will have certain reference value in influenza prediction. The ARMA(3,0) model predicts better results and has greater generalization. Finally, the lack of research in this paper and future research directions are given.展开更多
Four levels of the data from the search coil magnetometer(SCM) onboard the China Seismo-Electromagnetic Satellite(CSES)are defined and described. The data in different levels all contain three components of the wavefo...Four levels of the data from the search coil magnetometer(SCM) onboard the China Seismo-Electromagnetic Satellite(CSES)are defined and described. The data in different levels all contain three components of the waveform and/or spectrum of the induced magnetic field around the orbit in the frequency range of 10 Hz to 20 kHz; these are divided into an ultra-low-frequency band(ULF,10–200 Hz), an extremely low frequency band(ELF, 200–2200 Hz), and a very low frequency band(VLF, 1.8–20 kHz). Examples of data products for Level-2, Level-3, and Level-4 are presented. The initial results obtained in the commission test phase demonstrated that the SCM was in a normal operational status and that the data are of high enough quality to reliably capture most space weather events related to low-frequency geomagnetic disturbances.展开更多
A search strategy over encrypted cloud data based on keywords has been improved and has presented a method using different strategies on the client and the server to improve the search efficiency in this paper. The cl...A search strategy over encrypted cloud data based on keywords has been improved and has presented a method using different strategies on the client and the server to improve the search efficiency in this paper. The client uses the Chinese and English to achieve the synonym construction of the keywords, the establishment of the fuzzy-syllable words and synonyms set of keywords and the implementation of fuzzy search strategy over the encryption of cloud data based on keywords. The server side through the analysis of the user’s query request provides keywords for users to choose and topic words and secondary words are picked out. System will match topic words with historical inquiry in time order, and then the new query result of the request is directly gained. The analysis of the simulation experiment shows that the fuzzy search strategy can make better use of historical results on the basis of privacy protection for the realization of efficient data search, saving the search time and improving the efficiency of search.展开更多
The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 pa...The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 participants, 39 undergraduate students (novice users) and 25 graduate students (intermediate-level users) participated in the study. The experimental design was 2 × 2 × 2 × 3 mixed design using two between-subject variables (display complexity, user experience) and two within-subject variables (display format, question difficulty). The results indicated that response time was superior for graphs (relative to tables), especially when the questions were difficult. The intermediate users seemed to adopt more extensive search strategies than novices, as revealed by an analysis of the number of changes they made to the display prior to answering questions. It was concluded that designers of data displays should consider the (a) type of display, (b) difficulty of the task, and (c) expertise level of the user to obtain optimal levels of performance.展开更多
This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteris...This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.展开更多
Digital broadcasting is a novel paradigm for the next generation broadcasting. Its goal is to provide not only better quality of pictures but also a variety of services that is impossible in traditional airwaves broad...Digital broadcasting is a novel paradigm for the next generation broadcasting. Its goal is to provide not only better quality of pictures but also a variety of services that is impossible in traditional airwaves broadcasting. One of the important factors for this new broadcasting environment is the interoperability among broadcasting applications since the environment is distributed. Therefore the broadcasting metadata becomes increasingly important and one of the metadata standards for a digital broadcasting is TV-Anytime metadata. TV-Anytime metadata is defined using XML schema, so its instances are XML data. In order to fulfill interoperability, a standard query language is also required and XQuery is a natural choice. There are some researches for dealing with broadcasting metadata. In our previous study, we have proposed the method for efficiently managing the broadcasting metadata in a service provider. However, the environment of a Set-Top Box for digital broadcasting is limited such as low-cost and low-setting. Therefore there are some considerations to apply general approaches for managing the metadata into the Set-Top Box. This paper proposes a method for efficiently managing the broadcasting metadata based on the Set-Top Box and a prototype of metadata management system for evaluating our method. Our system consists of a storage engine to store the metadata and an XQuery engine to search the stored metadata and uses special index for storing and searching. Our two engines are designed independently with hardware platform therefore these engines can be used in any low-cost applications to manage broadcasting metadata.展开更多
Unlike consumers in the mall or supermarkets, online consumers are “intangible” and their purchasing behaviors are affected by multiple factors, including product pricing, promotion and discounts, quality of product...Unlike consumers in the mall or supermarkets, online consumers are “intangible” and their purchasing behaviors are affected by multiple factors, including product pricing, promotion and discounts, quality of products and brands, and the platforms where they search for the product. In this research, I study the relationship between product sales and consumer characteristics, the relationship between product sales and product qualities, demand curve analysis, and the search friction effect for different platforms. I utilized data from a randomized field experiment involving more than 400 thousand customers and 30 thousand products on JD.com, one of the world’s largest online retailing platforms. There are two focuses of the research: 1) how different consumer characteristics affect sales;2) how to set price and possible search friction for different channels. I find that JD plus membership, education level and age have no significant relationship with product sales, and higher user level leads to higher sales. Sales are highly skewed, with very high numbers of products sold making up only a small percentage of the total. Consumers living in more industrialized cities have more purchasing power. Women and singles lead to higher spending. Also, the better the product performs, the more it sells. Moderate pricing can increase product sales. Based on the research results of search volume in different channels, it is suggested that it is better to focus on app sales. By knowing the results, producers can adjust target consumers for different products and do target advertisements in order to maximize the sales. Also, an appropriate price for a product is also crucial to a seller. By the way, knowing the search friction of different channels can help producers to rearrange platform layout so that search friction can be reduced and more potential deals may be made.展开更多
In the age of information sharing, logistics information sharing also faces the risk of privacy leakage. In regard to the privacy leakage of time-series location information in the field of logistics, this paper propo...In the age of information sharing, logistics information sharing also faces the risk of privacy leakage. In regard to the privacy leakage of time-series location information in the field of logistics, this paper proposes a method based on differential privacy for time-series location data publication. Firstly, it constructs public region of interest(PROI) related to time by using clustering optimal algorithm. And it adopts the method of the centroid point to ensure the public interest point(PIP) representing the location of the public interest zone. Secondly, according to the PIP, we can construct location search tree(LST) that is a commonly used index structure of spatial data, in order to ensure the inherent relation among location data. Thirdly, we add Laplace noise to the node of LST, which means fewer times to add Laplace noise on the original data set and ensures the data availability. Finally, experiments show that this method not only ensures the security of sequential location data publishing, but also has better data availability than the general differential privacy method, which achieves a good balance between the security and availability of data.展开更多
To avoid the effects of systemic financial risks caused by extreme fluctuations in housing price,the Chinese government has been exploring the most effective policies for regulating the housing market.Measuring the ef...To avoid the effects of systemic financial risks caused by extreme fluctuations in housing price,the Chinese government has been exploring the most effective policies for regulating the housing market.Measuring the effect of real estate regulation policies has been a challenge for present studies.This study innovatively employs big data technology to obtain Internet search data(ISD)and construct market concern index(MCI)of policy,and hedonic price theory to construct hedonic price index(HPI)based on building area,age,ring number,and other hedonic variables.Then,the impact of market concerns for restrictive policy,monetary policy,fiscal policy,security policy,and administrative supervision policy on housing prices is evaluated.Moreover,compared with the common housing price index,the hedonic price index considers the heterogeneity of houses and could better reflect the changes in housing prices caused by market supply and demand.The results indicate that(1)a long-term interaction relationship exists between housing prices and market concerns for policy(MCP);(2)market concerns for restrictive policy and administrative supervision policy effectively restrain rising housing prices while those for monetary and fiscal policy have the opposite effect.The results could serve as a useful reference for governments aiming to stabilize their real estate markets.展开更多
Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subsp...Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms.展开更多
基金supported by the National Key Research and Development Program of China(Project No.2023YFC2307500).
文摘Influenza,an acute respiratory infectious disease caused by the influenza virus,exhibits distinct seasonal patterns in China,with peak activity occurring in winter and spring in northern regions,and in winter and summer in southern areas[1].The World Health Organization(WHO)emphasizes that early warning and epidemic intensity assessments are critical public health strategies for influenza prevention and control.Internet-based flu surveillance,with real-time data and low costs,effectively complements traditional methods.The Baidu Search Index,which reflects flu-related queries,strongly correlates with influenza trends,aiding in regional activity assessment and outbreak tracking[2].
文摘Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “symptom phase”, “treatment phase”, and “commonly-used phrase” were set. Python web crawler was used to obtain relevant influenza data from the National Influenza Center’s influenza surveillance weekly report and Baidu Index. The establishment of support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), convolutional neural networks (CNN) prediction models through machine learning, took into account the seasonal characteristics of the influenza, also established the time series model (ARMA). The results show that, it is feasible to predict influenza based on web search data. Machine learning shows a certain forecast effect in the prediction of influenza based on web search data. In the future, it will have certain reference value in influenza prediction. The ARMA(3,0) model predicts better results and has greater generalization. Finally, the lack of research in this paper and future research directions are given.
基金supported by the State Key R&D Project (Grant No. 2016YFE0122200)the Civil Aerospace Scientific Research Project “Data calibration and validation for CSES, ”the Central-Level Public Welfare Research Projects of the Institute of Crustal Dynamics Institute, China Earthquake Administration (Grant No. ZDJ2017-21)
文摘Four levels of the data from the search coil magnetometer(SCM) onboard the China Seismo-Electromagnetic Satellite(CSES)are defined and described. The data in different levels all contain three components of the waveform and/or spectrum of the induced magnetic field around the orbit in the frequency range of 10 Hz to 20 kHz; these are divided into an ultra-low-frequency band(ULF,10–200 Hz), an extremely low frequency band(ELF, 200–2200 Hz), and a very low frequency band(VLF, 1.8–20 kHz). Examples of data products for Level-2, Level-3, and Level-4 are presented. The initial results obtained in the commission test phase demonstrated that the SCM was in a normal operational status and that the data are of high enough quality to reliably capture most space weather events related to low-frequency geomagnetic disturbances.
文摘A search strategy over encrypted cloud data based on keywords has been improved and has presented a method using different strategies on the client and the server to improve the search efficiency in this paper. The client uses the Chinese and English to achieve the synonym construction of the keywords, the establishment of the fuzzy-syllable words and synonyms set of keywords and the implementation of fuzzy search strategy over the encryption of cloud data based on keywords. The server side through the analysis of the user’s query request provides keywords for users to choose and topic words and secondary words are picked out. System will match topic words with historical inquiry in time order, and then the new query result of the request is directly gained. The analysis of the simulation experiment shows that the fuzzy search strategy can make better use of historical results on the basis of privacy protection for the realization of efficient data search, saving the search time and improving the efficiency of search.
文摘The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 participants, 39 undergraduate students (novice users) and 25 graduate students (intermediate-level users) participated in the study. The experimental design was 2 × 2 × 2 × 3 mixed design using two between-subject variables (display complexity, user experience) and two within-subject variables (display format, question difficulty). The results indicated that response time was superior for graphs (relative to tables), especially when the questions were difficult. The intermediate users seemed to adopt more extensive search strategies than novices, as revealed by an analysis of the number of changes they made to the display prior to answering questions. It was concluded that designers of data displays should consider the (a) type of display, (b) difficulty of the task, and (c) expertise level of the user to obtain optimal levels of performance.
文摘This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.
文摘Digital broadcasting is a novel paradigm for the next generation broadcasting. Its goal is to provide not only better quality of pictures but also a variety of services that is impossible in traditional airwaves broadcasting. One of the important factors for this new broadcasting environment is the interoperability among broadcasting applications since the environment is distributed. Therefore the broadcasting metadata becomes increasingly important and one of the metadata standards for a digital broadcasting is TV-Anytime metadata. TV-Anytime metadata is defined using XML schema, so its instances are XML data. In order to fulfill interoperability, a standard query language is also required and XQuery is a natural choice. There are some researches for dealing with broadcasting metadata. In our previous study, we have proposed the method for efficiently managing the broadcasting metadata in a service provider. However, the environment of a Set-Top Box for digital broadcasting is limited such as low-cost and low-setting. Therefore there are some considerations to apply general approaches for managing the metadata into the Set-Top Box. This paper proposes a method for efficiently managing the broadcasting metadata based on the Set-Top Box and a prototype of metadata management system for evaluating our method. Our system consists of a storage engine to store the metadata and an XQuery engine to search the stored metadata and uses special index for storing and searching. Our two engines are designed independently with hardware platform therefore these engines can be used in any low-cost applications to manage broadcasting metadata.
文摘Unlike consumers in the mall or supermarkets, online consumers are “intangible” and their purchasing behaviors are affected by multiple factors, including product pricing, promotion and discounts, quality of products and brands, and the platforms where they search for the product. In this research, I study the relationship between product sales and consumer characteristics, the relationship between product sales and product qualities, demand curve analysis, and the search friction effect for different platforms. I utilized data from a randomized field experiment involving more than 400 thousand customers and 30 thousand products on JD.com, one of the world’s largest online retailing platforms. There are two focuses of the research: 1) how different consumer characteristics affect sales;2) how to set price and possible search friction for different channels. I find that JD plus membership, education level and age have no significant relationship with product sales, and higher user level leads to higher sales. Sales are highly skewed, with very high numbers of products sold making up only a small percentage of the total. Consumers living in more industrialized cities have more purchasing power. Women and singles lead to higher spending. Also, the better the product performs, the more it sells. Moderate pricing can increase product sales. Based on the research results of search volume in different channels, it is suggested that it is better to focus on app sales. By knowing the results, producers can adjust target consumers for different products and do target advertisements in order to maximize the sales. Also, an appropriate price for a product is also crucial to a seller. By the way, knowing the search friction of different channels can help producers to rearrange platform layout so that search friction can be reduced and more potential deals may be made.
基金Supported by the Social Science Foundation of Beijing(15JGB099,15ZHA004)the National Natural Science Foundation of China(61370139)"Information+" Special Fund(5111823610)
文摘In the age of information sharing, logistics information sharing also faces the risk of privacy leakage. In regard to the privacy leakage of time-series location information in the field of logistics, this paper proposes a method based on differential privacy for time-series location data publication. Firstly, it constructs public region of interest(PROI) related to time by using clustering optimal algorithm. And it adopts the method of the centroid point to ensure the public interest point(PIP) representing the location of the public interest zone. Secondly, according to the PIP, we can construct location search tree(LST) that is a commonly used index structure of spatial data, in order to ensure the inherent relation among location data. Thirdly, we add Laplace noise to the node of LST, which means fewer times to add Laplace noise on the original data set and ensures the data availability. Finally, experiments show that this method not only ensures the security of sequential location data publishing, but also has better data availability than the general differential privacy method, which achieves a good balance between the security and availability of data.
基金the National Natural Science Foundation of China(Nos.61703014 and 62073008).
文摘To avoid the effects of systemic financial risks caused by extreme fluctuations in housing price,the Chinese government has been exploring the most effective policies for regulating the housing market.Measuring the effect of real estate regulation policies has been a challenge for present studies.This study innovatively employs big data technology to obtain Internet search data(ISD)and construct market concern index(MCI)of policy,and hedonic price theory to construct hedonic price index(HPI)based on building area,age,ring number,and other hedonic variables.Then,the impact of market concerns for restrictive policy,monetary policy,fiscal policy,security policy,and administrative supervision policy on housing prices is evaluated.Moreover,compared with the common housing price index,the hedonic price index considers the heterogeneity of houses and could better reflect the changes in housing prices caused by market supply and demand.The results indicate that(1)a long-term interaction relationship exists between housing prices and market concerns for policy(MCP);(2)market concerns for restrictive policy and administrative supervision policy effectively restrain rising housing prices while those for monetary and fiscal policy have the opposite effect.The results could serve as a useful reference for governments aiming to stabilize their real estate markets.
基金supported in part by the National Natural Science Foundation of China (Nos. 61303074, 61309013)the Programs for Science, National Key Basic Research and Development Program ("973") of China (No. 2012CB315900)Technology Development of Henan province (Nos.12210231003, 13210231002)
文摘Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms.