Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative de...Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.展开更多
Selectivity estimation is crucial for query optimizers choosing an optimal spatial execution plan in a spatial database management system.This paper presents an Annular Bucket spatial histogram(AB histogram)that can e...Selectivity estimation is crucial for query optimizers choosing an optimal spatial execution plan in a spatial database management system.This paper presents an Annular Bucket spatial histogram(AB histogram)that can estimate the selectivity in finer spatial selection and spatial join operations even when the spatial query has more operators or more joins.The AB histogram is represented as a set of bucket-range,bucket-count value pairs.The bucket-range often covers an annular region like a sin-gle-cell-sized photo frame.The bucket-count is the number of objects whose Minimum Bounding Rectangles(MBRs)fall between outer rectangle and inner rectangle of the bucket-range.Assuming that all MBRs in each a bucket distribute evenly,for every buck-et,we can obtain serial probabilities that satisfy a certain spatial selection or join conditions from the operations' semantics and the spatial relations between every bucket-range and query ranges.Thus,according to some probability theories,spatial selection or join selectivity can be estimated by the every bucket-count and its probabilities.This paper also shows a way to generate an updated AB histogram from an original AB histogram and those probabilities.Our tests show that the AB histogram not only supports the selectivity estimation of spatial selection or spatial join with "disjoint","intersect","within","contains",and "overlap" operators but also provides an approach to generate a reliable updated histogram whose spatial distribution is close to the distribution of ac-tual query result.展开更多
In several instances of statistical practice, it is not uncommon to use the same data for both model selection and inference, without taking account of the variability induced by model selection step. This is usually ...In several instances of statistical practice, it is not uncommon to use the same data for both model selection and inference, without taking account of the variability induced by model selection step. This is usually referred to as post-model selection inference. The shortcomings of such practice are widely recognized, finding a general solution is extremely challenging. We propose a model averaging alternative consisting on taking into account model selection probability and the like-lihood in assigning the weights. The approach is applied to Bernoulli trials and outperforms Akaike weights model averaging and post-model selection estimators.展开更多
基金This work was supported by the National Natural Science Foundation of China[grant numbers 41222009,41271405].
文摘Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.
基金Supported by the Innovation Project of IGSNRR (No. O9V90220ZZ)the Research Plan of LREIS (O88RA700KA),CAS
文摘Selectivity estimation is crucial for query optimizers choosing an optimal spatial execution plan in a spatial database management system.This paper presents an Annular Bucket spatial histogram(AB histogram)that can estimate the selectivity in finer spatial selection and spatial join operations even when the spatial query has more operators or more joins.The AB histogram is represented as a set of bucket-range,bucket-count value pairs.The bucket-range often covers an annular region like a sin-gle-cell-sized photo frame.The bucket-count is the number of objects whose Minimum Bounding Rectangles(MBRs)fall between outer rectangle and inner rectangle of the bucket-range.Assuming that all MBRs in each a bucket distribute evenly,for every buck-et,we can obtain serial probabilities that satisfy a certain spatial selection or join conditions from the operations' semantics and the spatial relations between every bucket-range and query ranges.Thus,according to some probability theories,spatial selection or join selectivity can be estimated by the every bucket-count and its probabilities.This paper also shows a way to generate an updated AB histogram from an original AB histogram and those probabilities.Our tests show that the AB histogram not only supports the selectivity estimation of spatial selection or spatial join with "disjoint","intersect","within","contains",and "overlap" operators but also provides an approach to generate a reliable updated histogram whose spatial distribution is close to the distribution of ac-tual query result.
文摘In several instances of statistical practice, it is not uncommon to use the same data for both model selection and inference, without taking account of the variability induced by model selection step. This is usually referred to as post-model selection inference. The shortcomings of such practice are widely recognized, finding a general solution is extremely challenging. We propose a model averaging alternative consisting on taking into account model selection probability and the like-lihood in assigning the weights. The approach is applied to Bernoulli trials and outperforms Akaike weights model averaging and post-model selection estimators.