Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series da...Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series data, focusing on shorttime stocks prediction. This is an area that has been attracting a great deal of attention from researchers in the field. The main contribution of this paper is to provide an outline of the use of DM with time series data, using mainly examples related with short-term stocks prediction. This is important to a better understanding of the field. Some of the main trends and open issues will also be introduced.展开更多
This study is an exploratory analysis of applying natural language processing techniques such as Term Frequency-Inverse Document Frequency and Sentiment Analysis on Twitter data. The uniqueness of this work is establi...This study is an exploratory analysis of applying natural language processing techniques such as Term Frequency-Inverse Document Frequency and Sentiment Analysis on Twitter data. The uniqueness of this work is established by determining the overall sentiment of a politician’s tweets based on TF-IDF values of terms used in their published tweets. By calculating the TF-IDF value of terms from the corpus, this work displays the correlation between TF-IDF score and polarity. The results of this work show that calculating the TF-IDF score of the corpus allows for a more accurate representation of the overall polarity since terms are given a weight based on their uniqueness and relevance rather than just the frequency at which they appear in the corpus.展开更多
We study a class of nonlinear parabolic equations of the type:δb(u)/δt-div(a(x,t,u)△u)+y(u)|△u|^2=f,where the right hand side belongs to L^1(Q), b is a strictly increasing C^1-function and -div(a(x...We study a class of nonlinear parabolic equations of the type:δb(u)/δt-div(a(x,t,u)△u)+y(u)|△u|^2=f,where the right hand side belongs to L^1(Q), b is a strictly increasing C^1-function and -div(a(x, t, u)△u) is a Leray-Lions operator. The function g is just assumed to be continuous on R and to satisfy a sign condition. Without any additional growth assumption on u, we prove the existence of a renormalized solution.展开更多
This paper describes the implementation of a data logger for the real-time in-situ monitoring of hydrothermal systems. A compact mechanical structure ensures the security and reliability of data logger when used under...This paper describes the implementation of a data logger for the real-time in-situ monitoring of hydrothermal systems. A compact mechanical structure ensures the security and reliability of data logger when used under deep sea. The data logger is a battery powered instrument, which can connect chemical sensors (pH electrode, H2S electrode, H2 electrode) and temperature sensors. In order to achieve major energy savings, dynamic power management is implemented in hardware design and software design. The working current of the data logger in idle mode and active mode is 15 μA and 1.44 mA respectively, which greatly extends the working time of battery. The data logger has been successftdly tested in the first Sino-American Cooperative Deep Submergence Project from August 13 to September 3, 2005.展开更多
A long-term dataset of photosynthetically active radiation (Qp) is reconstructed from a broadband global solar radiation (Rs) dataset through an all-weather reconstruction model. This method is based on four years...A long-term dataset of photosynthetically active radiation (Qp) is reconstructed from a broadband global solar radiation (Rs) dataset through an all-weather reconstruction model. This method is based on four years' worth of data collected in Beijing. Observation data of Rs and Qp from 2005-2008 are used to investigate the temporal variability of Qp and its dependence on the clearness index and solar zenith angle. A simple and effcient all-weather empirically derived reconstruction model is proposed to reconstruct Qp from Rs. This reconstruction method is found to estimate instantaneous Qp with high accuracy. The annual mean of the daily values of Qp during the period 1958-2005 period is 25.06 mol m-2 d-1. The magnitude of the long-term trend for the annual averaged Qp is presented (-0.19 mol m-2 yr-1 from 1958-1997 and -0.12 mol m-2 yr-1 from 1958-2005). The trend in Qp exhibits sharp decreases in the spring and summer and more gentle decreases in the autumn and winter.展开更多
Background:In the past decade,many researchers focused on to robot-assisted surgery.However,on long-term outcomes for patients with early-stage non-small cell lung cancer(NSCLC),whether the robotic procedure is superi...Background:In the past decade,many researchers focused on to robot-assisted surgery.However,on long-term outcomes for patients with early-stage non-small cell lung cancer(NSCLC),whether the robotic procedure is superior to video-assisted thoracic surgery(VATS) and thoracotomy is unclear.Nonetheless,in the article titled "Long-term survival based on the surgical approach to lobectomy for clinical stage I non-small cell lung cancer:comparison of robotic,video assisted thoracic surgery,and thoracotomy lobectomy" by Yang et al.that was recently published in Annals of Surgery,the authors provided convincing evidence that the robotic procedure results in similar long-term survival as compared with VATS and thoracotomy.Minimally invasive procedures typically result in shorter lengths of hospital stay,and the robotic procedure in particular results in superior lymph node assessment.Main body:Our propensity score-matched study generated high-quality data.Based on our findings,we see promise in expanding patient access to robotic lung resections.In this study,propensity score matching minimized the bias involved between groups.Nevertheless,due to its retrospective nature,bias may still exist.Currently,the concept of rapid rehabilitation is widely accepted,and it is very difficult to set up a randomized controlled trial to compare robotic,VATS,and thoracotomy procedures for the treatment of NSCLC.Therefore,to overcome this limitation and to minimize bias,the best approach is to use a registry and prospectively collected,propensity score-matched data.Conclusions:Robotic lung resections result in similar long-term survival as compared with VATS and thoracotomy.Robot-assisted and VATS procedures are associated with short lengths of hospital stay,and the robotic procedure in particular results in superior lymph node assessment.Considering the alarming increase in the incidence of lung cancer in China,a nationwide database of prospectively collected data available for clinical research would be especially important.展开更多
This paper focuses on facilitating state-of-the-art applications of big data analytics(BDA) architectures and infrastructures to telecommunications(telecom) industrial sector.Telecom companies are dealing with terabyt...This paper focuses on facilitating state-of-the-art applications of big data analytics(BDA) architectures and infrastructures to telecommunications(telecom) industrial sector.Telecom companies are dealing with terabytes to petabytes of data on a daily basis. Io T applications in telecom are further contributing to this data deluge. Recent advances in BDA have exposed new opportunities to get actionable insights from telecom big data. These benefits and the fast-changing BDA technology landscape make it important to investigate existing BDA applications to telecom sector. For this, we initially determine published research on BDA applications to telecom through a systematic literature review through which we filter 38 articles and categorize them in frameworks, use cases, literature reviews, white papers and experimental validations. We also discuss the benefits and challenges mentioned in these articles. We find that experiments are all proof of concepts(POC) on a severely limited BDA technology stack(as compared to the available technology stack), i.e.,we did not find any work focusing on full-fledged BDA implementation in an operational telecom environment. To facilitate these applications at research-level, we propose a state-of-the-art lambda architecture for BDA pipeline implementation(called Lambda Tel) based completely on open source BDA technologies and the standard Python language, along with relevant guidelines.We discovered only one research paper which presented a relatively-limited lambda architecture using the proprietary AWS cloud infrastructure. We believe Lambda Tel presents a clear roadmap for telecom industry practitioners to implement and enhance BDA applications in their enterprises.展开更多
The backup requirement of data centres is tremendous as the size of data created by human is massive and is increasing exponentially.Single node deduplication cannot meet the increasing backup requirement of data cent...The backup requirement of data centres is tremendous as the size of data created by human is massive and is increasing exponentially.Single node deduplication cannot meet the increasing backup requirement of data centres.A feasible way is the deduplication cluster,which can meet it by adding storage nodes.The data routing strategy is the key of the deduplication cluster.DRSS(data routing strategy using semantics) improves the storage utilization of MCS(minimum chunk signature) data routing strategy a lot.However,for the large deduplication cluster,the load balance of DRSS is worse than MCS.To improve the load balance of DRSS,we propose a load balance strategy used for DRSS,namely DRSSLB.When a node is overloaded,DRSSLB iteratively migrates the current smallest container of the node to the smallest node in the deduplication cluster until this overloaded node becomes non-overloaded.A container is the minimum unit of data migration.Similar files sharing the same features or file names are stored in the same container.This ensures the similar data groups are still in the same node after rebalancing the nodes.We use the dataset from the real world to evaluate DRSSLB.Experimental results show that,for various numbers of nodes of the deduplication cluster,the data skews of DRSSLB are under predefined value while the storage utilizations of DRSSLB do not nearly increase compared with DRSS,with the low penalty(the data migration rate is only6.5% when the number of nodes is 64).展开更多
Named-data Networking(NDN) is a promising future Internet architecture, which introduces some evolutionary elements into layer-3, e.g., consumer-driven communication, soft state on data forwarding plane and hop-byhop ...Named-data Networking(NDN) is a promising future Internet architecture, which introduces some evolutionary elements into layer-3, e.g., consumer-driven communication, soft state on data forwarding plane and hop-byhop traffic control. And those elements ensure data holders to solely return the requested data within the lifetime of the request, instead of pushing data whenever needed and whatever it is. Despite the dispute on the advantages and their prices, this pattern requires data consumers to keep sending requests at the right moments for continuous data transmission, resulting in significant forwarding cost and sophisticated application design. In this paper, we propose Interest Set(IS) mechanism, which compresses a set of similar Interests into one request, and maintains a relative long-term data returning path with soft state and continuous feedback from upstream. In this way, IS relaxes the above requirement, and scales NDN data forwarding by reducing forwarded requests and soft states that are needed to retrieve a given set of data.展开更多
Utilizing the 45 a European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis wave da- ta (ERA-40), the long-term trend of the sea surface wind speed and (wind wave, swell, mixed wave) wave height in ...Utilizing the 45 a European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis wave da- ta (ERA-40), the long-term trend of the sea surface wind speed and (wind wave, swell, mixed wave) wave height in the global ocean at grid point 1.5°× 1.5° during the last 44 a is analyzed. It is discovered that a ma- jority of global ocean swell wave height exhibits a significant linear increasing trend (2-8 cm/decade), the distribution of annual linear trend of the significant wave height (SWH) has good consistency with that of the swell wave height. The sea surface wind speed shows an annually linear increasing trend mainly con- centrated in the most waters of Southern Hemisphere westerlies, high latitude of the North Pacific, Indian Ocean north of 30°S, the waters near the western equatorial Pacific and low latitudes of the Atlantic waters, and the annually linear decreasing mainly in central and eastern equator of the Pacific, Juan. Fernandez Archipelago, the waters near South Georgia Island in the Atlantic waters. The linear variational distribution characteristic of the wind wave height is similar to that of the sea surface wind speed. Another find is that the swell is dominant in the mixed wave, the swell index in the central ocean is generally greater than that in the offshore, and the swell index in the eastern ocean coast is greater than that in the western ocean inshore, and in year-round hemisphere westerlies the swell index is relatively low.展开更多
This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usua...This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usual ones. First, it is time-dependent. Second, it may be discontinuous. Third, not every term of it is required to be positive definite. Fourth, the Lyapunov functional includes not only the state and the sampled state but also the integral of the state. By using a recently reported inequality to estimate the derivative of this Lyapunov functional, a sampled-interval-dependent stability criterion with reduced conservatism is obtained. The stability criterion is further extended to sampled-data systems with polytopic uncertainties. Finally, three examples are given to illustrate the reduced conservatism of the stability criteria.展开更多
We present a novel paradigm of sensor placement concerning data precision and estimation.Multiple abstract sensors are used to measure a quantity of a moving target in the scenario of a wireless sensor network.These s...We present a novel paradigm of sensor placement concerning data precision and estimation.Multiple abstract sensors are used to measure a quantity of a moving target in the scenario of a wireless sensor network.These sensors can cooperate with each other to obtain a precise estimate of the quantity in a real-time manner.We consider a problem on planning a minimum-cost scheme of sensor placement with desired data precision and resource consumption.Measured data is modeled as a Gaussian random variable with a changeable variance.A gird model is used to approximate the problem.We solve the problem with a heuristic algorithm using branch-and-bound method and tabu search.Our experiments demonstrate that the algorithm is correct in a certain tolerance,and it is also efficient and scalable.展开更多
In the paper, firstly, based on new non-tensor-product-typed partially inverse divided differences algorithms in a recursive form, scattered data interpolating schemes are constructed via bivariate continued fractions...In the paper, firstly, based on new non-tensor-product-typed partially inverse divided differences algorithms in a recursive form, scattered data interpolating schemes are constructed via bivariate continued fractions with odd and even nodes, respectively. And equivalent identities are also obtained between interpolated functions and bivariate continued fractions. Secondly, by means of three-term recurrence relations for continued fractions, the characterization theorem is presented to study on the degrees of the numerators and denominators of the interpolating continued fractions. Thirdly, some numerical examples show it feasible for the novel recursive schemes. Meanwhile, compared with the degrees of the numera- tors and denominators of bivariate Thiele-typed interpolating continued fractions, those of the new bivariate interpolating continued fractions are much low, respectively, due to the reduc- tion of redundant interpolating nodes. Finally, the operation count for the rational function interpolation is smaller than that for radial basis function interpolation.展开更多
A four-dimensional data assimilation (FDDA) scheme based on a Newtonian relaxation (or “nudging”) was tested using observational asynoptic data collected at a coastal site in the Central Mediterranean peninsula of C...A four-dimensional data assimilation (FDDA) scheme based on a Newtonian relaxation (or “nudging”) was tested using observational asynoptic data collected at a coastal site in the Central Mediterranean peninsula of Calabria, southern Italy. The study is referred to an experimental campaign carried out in summer 2008. For this period a wind profiler, a sodar and two surface meteorological stations were considered. The collected measurements were used for the FDDA scheme, and the technique was incorporated into a tailored version of the Regional Atmospheric Modeling System (RAMS). All instruments are installed and operated routinely at the experimental field of the CRATI-ISAC/CNR located at 600 m from the Tyrrhenian coastline. Several simulations were performed, and the results show that the assimilation of wind and/or temperature data, both throughout the simulation time (continuous FDDA) and for a 12 h time window (forecasting configuration), produces improvements of the model performance. Considering a whole single day, improvements are sub-stantial in the case of continuous FDDA while they are smaller in the case of forecasting configuration. En-hancements, during the first six hours of each run, are generally higher. The resulting meteorological fields are finalised as input into air quality and agro-meteorological models, for short-term predictions of renew-able energy production forecast, and for atmospheric model initialization.展开更多
The concept of dual image reversible data hiding(DIRDH) is the technique that can produce two camouflage images after embedding secret data into one original image.Moreover,not only can the secret data be extracted ...The concept of dual image reversible data hiding(DIRDH) is the technique that can produce two camouflage images after embedding secret data into one original image.Moreover,not only can the secret data be extracted from two camouflage images but also the original image can be recovered.To achieve high image quality,Lu et al.'s method applied least-significant-bit(LSB) matching revisited to DIRDH.In order to further improve the image quality,the proposed method modifies LSB matching revisited rules and applies them to DIRDH.According to the experimental results,the image quality of the proposed method is better than that of Lu et al.'s method.展开更多
We use wavelet transform to analyze the daily relative sunspot number series over solar cycles 10-23. The characteristics of some of the periods shorter than - 600-day are discussed. The results exhibit not only the v...We use wavelet transform to analyze the daily relative sunspot number series over solar cycles 10-23. The characteristics of some of the periods shorter than - 600-day are discussed. The results exhibit not only the variation of some short periods in the 14 solar cycles but also the characteristics and differences around solar peaks and valley years. The short periodic components with larger amplitude such as ~27, ~ 150 and ~360-day are obvious in some solar cycles, all of them are time-variable, also their lengths and amplitudes are variable and intermittent in time. The variable characteristics of the periods are rather different in different solar cycles.展开更多
An outsource database is a database service provided by cloud computing companies.Using the outsource database can reduce the hardware and software's cost and also get more efficient and reliable data processing capa...An outsource database is a database service provided by cloud computing companies.Using the outsource database can reduce the hardware and software's cost and also get more efficient and reliable data processing capacity.However,the outsource database still has some challenges.If the service provider does not have sufficient confidence,there is the possibility of data leakage.The data may has user's privacy,so data leakage may cause data privacy leak.Based on this factor,to protect the privacy of data in the outsource database becomes very important.In the past,scholars have proposed k-anonymity to protect data privacy in the database.It lets data become anonymous to avoid data privacy leak.But k-anonymity has some problems,it is irreversible,and easier to be attacked by homogeneity attack and background knowledge attack.Later on,scholars have proposed some studies to solve homogeneity attack and background knowledge attack.But their studies still cannot recover back to the original data.In this paper,we propose a data anonymity method.It can be reversible and also prevent those two attacks.Our study is based on the proposed r-transform.It can be used on the numeric type of attributes in the outsource database.In the experiment,we discussed the time required to anonymize and recover data.Furthermore,we investigated the defense against homogeneous attack and background knowledge attack.At the end,we summarized the proposed method and future researches.展开更多
短期预测在智能电网建设中扮演着重要角色,深刻影响电网发输变配用各个环节的智能化改造。短期预测一般基于系统实测数据,而传感器故障,数据传输错误等原因会导致数据质量下降,严重影响短期预测的精确性。为建立数据质量受损情况下的精...短期预测在智能电网建设中扮演着重要角色,深刻影响电网发输变配用各个环节的智能化改造。短期预测一般基于系统实测数据,而传感器故障,数据传输错误等原因会导致数据质量下降,严重影响短期预测的精确性。为建立数据质量受损情况下的精确短期预测模型,提出了结合数据预处理和双向长短期记忆(bi-directional long short-term memory,Bi-LSTM)的短期预测框架Bi-LSTM-DP(bi-directional long short-term memory data preprocessing)。在Bi-LSTM-DP中,采集的数据首先通过均值填补缺失值,进而基于Savitzky-Golay滤波器对数据降噪,最后采用Bi-LSTM提取时间序列的信息,实现短期预测。为了评估所提方法的性能,文中使用实测的公开数据集分别预测风电发电量和负荷需求,与其他参考方法对比表明了所述方法的有效性和鲁棒性。展开更多
文摘Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series data, focusing on shorttime stocks prediction. This is an area that has been attracting a great deal of attention from researchers in the field. The main contribution of this paper is to provide an outline of the use of DM with time series data, using mainly examples related with short-term stocks prediction. This is important to a better understanding of the field. Some of the main trends and open issues will also be introduced.
文摘This study is an exploratory analysis of applying natural language processing techniques such as Term Frequency-Inverse Document Frequency and Sentiment Analysis on Twitter data. The uniqueness of this work is established by determining the overall sentiment of a politician’s tweets based on TF-IDF values of terms used in their published tweets. By calculating the TF-IDF value of terms from the corpus, this work displays the correlation between TF-IDF score and polarity. The results of this work show that calculating the TF-IDF score of the corpus allows for a more accurate representation of the overall polarity since terms are given a weight based on their uniqueness and relevance rather than just the frequency at which they appear in the corpus.
文摘We study a class of nonlinear parabolic equations of the type:δb(u)/δt-div(a(x,t,u)△u)+y(u)|△u|^2=f,where the right hand side belongs to L^1(Q), b is a strictly increasing C^1-function and -div(a(x, t, u)△u) is a Leray-Lions operator. The function g is just assumed to be continuous on R and to satisfy a sign condition. Without any additional growth assumption on u, we prove the existence of a renormalized solution.
基金supported by the International Cooperative Key Project(Grant No.2004DFA04900)Ministry of Sciences and Technology of PRC,and the National Natural Science Foundation of China (Grant Nos.40637037 and 50675198)
文摘This paper describes the implementation of a data logger for the real-time in-situ monitoring of hydrothermal systems. A compact mechanical structure ensures the security and reliability of data logger when used under deep sea. The data logger is a battery powered instrument, which can connect chemical sensors (pH electrode, H2S electrode, H2 electrode) and temperature sensors. In order to achieve major energy savings, dynamic power management is implemented in hardware design and software design. The working current of the data logger in idle mode and active mode is 15 μA and 1.44 mA respectively, which greatly extends the working time of battery. The data logger has been successftdly tested in the first Sino-American Cooperative Deep Submergence Project from August 13 to September 3, 2005.
基金supported by the National Basic Research Program of China(No.2007CB407303)
文摘A long-term dataset of photosynthetically active radiation (Qp) is reconstructed from a broadband global solar radiation (Rs) dataset through an all-weather reconstruction model. This method is based on four years' worth of data collected in Beijing. Observation data of Rs and Qp from 2005-2008 are used to investigate the temporal variability of Qp and its dependence on the clearness index and solar zenith angle. A simple and effcient all-weather empirically derived reconstruction model is proposed to reconstruct Qp from Rs. This reconstruction method is found to estimate instantaneous Qp with high accuracy. The annual mean of the daily values of Qp during the period 1958-2005 period is 25.06 mol m-2 d-1. The magnitude of the long-term trend for the annual averaged Qp is presented (-0.19 mol m-2 yr-1 from 1958-1997 and -0.12 mol m-2 yr-1 from 1958-2005). The trend in Qp exhibits sharp decreases in the spring and summer and more gentle decreases in the autumn and winter.
文摘Background:In the past decade,many researchers focused on to robot-assisted surgery.However,on long-term outcomes for patients with early-stage non-small cell lung cancer(NSCLC),whether the robotic procedure is superior to video-assisted thoracic surgery(VATS) and thoracotomy is unclear.Nonetheless,in the article titled "Long-term survival based on the surgical approach to lobectomy for clinical stage I non-small cell lung cancer:comparison of robotic,video assisted thoracic surgery,and thoracotomy lobectomy" by Yang et al.that was recently published in Annals of Surgery,the authors provided convincing evidence that the robotic procedure results in similar long-term survival as compared with VATS and thoracotomy.Minimally invasive procedures typically result in shorter lengths of hospital stay,and the robotic procedure in particular results in superior lymph node assessment.Main body:Our propensity score-matched study generated high-quality data.Based on our findings,we see promise in expanding patient access to robotic lung resections.In this study,propensity score matching minimized the bias involved between groups.Nevertheless,due to its retrospective nature,bias may still exist.Currently,the concept of rapid rehabilitation is widely accepted,and it is very difficult to set up a randomized controlled trial to compare robotic,VATS,and thoracotomy procedures for the treatment of NSCLC.Therefore,to overcome this limitation and to minimize bias,the best approach is to use a registry and prospectively collected,propensity score-matched data.Conclusions:Robotic lung resections result in similar long-term survival as compared with VATS and thoracotomy.Robot-assisted and VATS procedures are associated with short lengths of hospital stay,and the robotic procedure in particular results in superior lymph node assessment.Considering the alarming increase in the incidence of lung cancer in China,a nationwide database of prospectively collected data available for clinical research would be especially important.
基金supported in part by the Big Data Analytics Laboratory(BDALAB)at the Institute of Business Administration under the research grant approved by the Higher Education Commission of Pakistan(www.hec.gov.pk)the Darbi company(www.darbi.io)
文摘This paper focuses on facilitating state-of-the-art applications of big data analytics(BDA) architectures and infrastructures to telecommunications(telecom) industrial sector.Telecom companies are dealing with terabytes to petabytes of data on a daily basis. Io T applications in telecom are further contributing to this data deluge. Recent advances in BDA have exposed new opportunities to get actionable insights from telecom big data. These benefits and the fast-changing BDA technology landscape make it important to investigate existing BDA applications to telecom sector. For this, we initially determine published research on BDA applications to telecom through a systematic literature review through which we filter 38 articles and categorize them in frameworks, use cases, literature reviews, white papers and experimental validations. We also discuss the benefits and challenges mentioned in these articles. We find that experiments are all proof of concepts(POC) on a severely limited BDA technology stack(as compared to the available technology stack), i.e.,we did not find any work focusing on full-fledged BDA implementation in an operational telecom environment. To facilitate these applications at research-level, we propose a state-of-the-art lambda architecture for BDA pipeline implementation(called Lambda Tel) based completely on open source BDA technologies and the standard Python language, along with relevant guidelines.We discovered only one research paper which presented a relatively-limited lambda architecture using the proprietary AWS cloud infrastructure. We believe Lambda Tel presents a clear roadmap for telecom industry practitioners to implement and enhance BDA applications in their enterprises.
基金supported by the National Natural Science Foundation of China under Grant No.61373120the Aeronautical Science Foundation of China under Grant No.2014ZD53049
文摘The backup requirement of data centres is tremendous as the size of data created by human is massive and is increasing exponentially.Single node deduplication cannot meet the increasing backup requirement of data centres.A feasible way is the deduplication cluster,which can meet it by adding storage nodes.The data routing strategy is the key of the deduplication cluster.DRSS(data routing strategy using semantics) improves the storage utilization of MCS(minimum chunk signature) data routing strategy a lot.However,for the large deduplication cluster,the load balance of DRSS is worse than MCS.To improve the load balance of DRSS,we propose a load balance strategy used for DRSS,namely DRSSLB.When a node is overloaded,DRSSLB iteratively migrates the current smallest container of the node to the smallest node in the deduplication cluster until this overloaded node becomes non-overloaded.A container is the minimum unit of data migration.Similar files sharing the same features or file names are stored in the same container.This ensures the similar data groups are still in the same node after rebalancing the nodes.We use the dataset from the real world to evaluate DRSSLB.Experimental results show that,for various numbers of nodes of the deduplication cluster,the data skews of DRSSLB are under predefined value while the storage utilizations of DRSSLB do not nearly increase compared with DRSS,with the low penalty(the data migration rate is only6.5% when the number of nodes is 64).
基金supported by the National Hightech R&D Program ("863" Program) of China (No.2013AA013505)the National Science Foundation of China (No.61472213)
文摘Named-data Networking(NDN) is a promising future Internet architecture, which introduces some evolutionary elements into layer-3, e.g., consumer-driven communication, soft state on data forwarding plane and hop-byhop traffic control. And those elements ensure data holders to solely return the requested data within the lifetime of the request, instead of pushing data whenever needed and whatever it is. Despite the dispute on the advantages and their prices, this pattern requires data consumers to keep sending requests at the right moments for continuous data transmission, resulting in significant forwarding cost and sophisticated application design. In this paper, we propose Interest Set(IS) mechanism, which compresses a set of similar Interests into one request, and maintains a relative long-term data returning path with soft state and continuous feedback from upstream. In this way, IS relaxes the above requirement, and scales NDN data forwarding by reducing forwarded requests and soft states that are needed to retrieve a given set of data.
基金The National Basic Research Program of China under contract No.2012CB957803
文摘Utilizing the 45 a European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis wave da- ta (ERA-40), the long-term trend of the sea surface wind speed and (wind wave, swell, mixed wave) wave height in the global ocean at grid point 1.5°× 1.5° during the last 44 a is analyzed. It is discovered that a ma- jority of global ocean swell wave height exhibits a significant linear increasing trend (2-8 cm/decade), the distribution of annual linear trend of the significant wave height (SWH) has good consistency with that of the swell wave height. The sea surface wind speed shows an annually linear increasing trend mainly con- centrated in the most waters of Southern Hemisphere westerlies, high latitude of the North Pacific, Indian Ocean north of 30°S, the waters near the western equatorial Pacific and low latitudes of the Atlantic waters, and the annually linear decreasing mainly in central and eastern equator of the Pacific, Juan. Fernandez Archipelago, the waters near South Georgia Island in the Atlantic waters. The linear variational distribution characteristic of the wind wave height is similar to that of the sea surface wind speed. Another find is that the swell is dominant in the mixed wave, the swell index in the central ocean is generally greater than that in the offshore, and the swell index in the eastern ocean coast is greater than that in the western ocean inshore, and in year-round hemisphere westerlies the swell index is relatively low.
基金supported by the National Natural Science Foundation of China(61374090)the Program for Scientific Research Innovation Team in Colleges and Universities of Shandong Provincethe Taishan Scholarship Project of Shandong Province
文摘This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usual ones. First, it is time-dependent. Second, it may be discontinuous. Third, not every term of it is required to be positive definite. Fourth, the Lyapunov functional includes not only the state and the sampled state but also the integral of the state. By using a recently reported inequality to estimate the derivative of this Lyapunov functional, a sampled-interval-dependent stability criterion with reduced conservatism is obtained. The stability criterion is further extended to sampled-data systems with polytopic uncertainties. Finally, three examples are given to illustrate the reduced conservatism of the stability criteria.
基金Supported of Project of Fok Ying Tong Education Foundation(No.104030)Supported of Key Project of National Natural Science of Foundation of China(No.70531020)+2 种基金Supported of Project of New Century Excellent Talent(No.NCET-06-0382)Supported of Key Project of Education Ministry of China(No.306023)Supported of Project of Doctoral Education(20070247075)
文摘We present a novel paradigm of sensor placement concerning data precision and estimation.Multiple abstract sensors are used to measure a quantity of a moving target in the scenario of a wireless sensor network.These sensors can cooperate with each other to obtain a precise estimate of the quantity in a real-time manner.We consider a problem on planning a minimum-cost scheme of sensor placement with desired data precision and resource consumption.Measured data is modeled as a Gaussian random variable with a changeable variance.A gird model is used to approximate the problem.We solve the problem with a heuristic algorithm using branch-and-bound method and tabu search.Our experiments demonstrate that the algorithm is correct in a certain tolerance,and it is also efficient and scalable.
基金Supported by the Special Funds Tianyuan for the National Natural Science Foundation of China(Grant No.11426086)the Fundamental Research Funds for the Central Universities(Grant No.2016B08714)the Natural Science Foundation of Jiangsu Province for the Youth(Grant No.BK20160853)
文摘In the paper, firstly, based on new non-tensor-product-typed partially inverse divided differences algorithms in a recursive form, scattered data interpolating schemes are constructed via bivariate continued fractions with odd and even nodes, respectively. And equivalent identities are also obtained between interpolated functions and bivariate continued fractions. Secondly, by means of three-term recurrence relations for continued fractions, the characterization theorem is presented to study on the degrees of the numerators and denominators of the interpolating continued fractions. Thirdly, some numerical examples show it feasible for the novel recursive schemes. Meanwhile, compared with the degrees of the numera- tors and denominators of bivariate Thiele-typed interpolating continued fractions, those of the new bivariate interpolating continued fractions are much low, respectively, due to the reduc- tion of redundant interpolating nodes. Finally, the operation count for the rational function interpolation is smaller than that for radial basis function interpolation.
文摘A four-dimensional data assimilation (FDDA) scheme based on a Newtonian relaxation (or “nudging”) was tested using observational asynoptic data collected at a coastal site in the Central Mediterranean peninsula of Calabria, southern Italy. The study is referred to an experimental campaign carried out in summer 2008. For this period a wind profiler, a sodar and two surface meteorological stations were considered. The collected measurements were used for the FDDA scheme, and the technique was incorporated into a tailored version of the Regional Atmospheric Modeling System (RAMS). All instruments are installed and operated routinely at the experimental field of the CRATI-ISAC/CNR located at 600 m from the Tyrrhenian coastline. Several simulations were performed, and the results show that the assimilation of wind and/or temperature data, both throughout the simulation time (continuous FDDA) and for a 12 h time window (forecasting configuration), produces improvements of the model performance. Considering a whole single day, improvements are sub-stantial in the case of continuous FDDA while they are smaller in the case of forecasting configuration. En-hancements, during the first six hours of each run, are generally higher. The resulting meteorological fields are finalised as input into air quality and agro-meteorological models, for short-term predictions of renew-able energy production forecast, and for atmospheric model initialization.
基金supported by MOST under Grants No.105-2410-H-468-010 and No.105-2221-E-468-019
文摘The concept of dual image reversible data hiding(DIRDH) is the technique that can produce two camouflage images after embedding secret data into one original image.Moreover,not only can the secret data be extracted from two camouflage images but also the original image can be recovered.To achieve high image quality,Lu et al.'s method applied least-significant-bit(LSB) matching revisited to DIRDH.In order to further improve the image quality,the proposed method modifies LSB matching revisited rules and applies them to DIRDH.According to the experimental results,the image quality of the proposed method is better than that of Lu et al.'s method.
基金Supported by the National Natural Science Foundation of China.
文摘We use wavelet transform to analyze the daily relative sunspot number series over solar cycles 10-23. The characteristics of some of the periods shorter than - 600-day are discussed. The results exhibit not only the variation of some short periods in the 14 solar cycles but also the characteristics and differences around solar peaks and valley years. The short periodic components with larger amplitude such as ~27, ~ 150 and ~360-day are obvious in some solar cycles, all of them are time-variable, also their lengths and amplitudes are variable and intermittent in time. The variable characteristics of the periods are rather different in different solar cycles.
文摘An outsource database is a database service provided by cloud computing companies.Using the outsource database can reduce the hardware and software's cost and also get more efficient and reliable data processing capacity.However,the outsource database still has some challenges.If the service provider does not have sufficient confidence,there is the possibility of data leakage.The data may has user's privacy,so data leakage may cause data privacy leak.Based on this factor,to protect the privacy of data in the outsource database becomes very important.In the past,scholars have proposed k-anonymity to protect data privacy in the database.It lets data become anonymous to avoid data privacy leak.But k-anonymity has some problems,it is irreversible,and easier to be attacked by homogeneity attack and background knowledge attack.Later on,scholars have proposed some studies to solve homogeneity attack and background knowledge attack.But their studies still cannot recover back to the original data.In this paper,we propose a data anonymity method.It can be reversible and also prevent those two attacks.Our study is based on the proposed r-transform.It can be used on the numeric type of attributes in the outsource database.In the experiment,we discussed the time required to anonymize and recover data.Furthermore,we investigated the defense against homogeneous attack and background knowledge attack.At the end,we summarized the proposed method and future researches.
文摘短期预测在智能电网建设中扮演着重要角色,深刻影响电网发输变配用各个环节的智能化改造。短期预测一般基于系统实测数据,而传感器故障,数据传输错误等原因会导致数据质量下降,严重影响短期预测的精确性。为建立数据质量受损情况下的精确短期预测模型,提出了结合数据预处理和双向长短期记忆(bi-directional long short-term memory,Bi-LSTM)的短期预测框架Bi-LSTM-DP(bi-directional long short-term memory data preprocessing)。在Bi-LSTM-DP中,采集的数据首先通过均值填补缺失值,进而基于Savitzky-Golay滤波器对数据降噪,最后采用Bi-LSTM提取时间序列的信息,实现短期预测。为了评估所提方法的性能,文中使用实测的公开数据集分别预测风电发电量和负荷需求,与其他参考方法对比表明了所述方法的有效性和鲁棒性。