期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Embedding-based approximate query for knowledge graph
1
作者 Qiu Jingyi Zhang Duxi +5 位作者 Song Aibo Wang Honglin Zhang Tianbo Jin Jiahui Fang Xiaolin Li Yaqi 《Journal of Southeast University(English Edition)》 EI CAS 2024年第4期417-424,共8页
To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are cla... To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are classified according to the degrees of approximation required for different types of nodes.This classification transforms the query problem into three constraints,from which approximate information is extracted.Second,candidates are generated by calculating the similarity between embeddings.Finally,a deep neural network model is designed,incorporating a loss function based on the high-dimensional ellipsoidal diffusion distance.This model identifies the distance between nodes using their embeddings and constructs a score function.k nodes are returned as the query results.The results show that the proposed method can return both exact results and approximate matching results.On datasets DBLP(DataBase systems and Logic Programming)and FUA-S(Flight USA Airports-Sparse),this method exhibits superior performance in terms of precision and recall,returning results in 0.10 and 0.03 s,respectively.This indicates greater efficiency compared to PathSim and other comparative methods. 展开更多
关键词 approximate query knowledge graph EMBEDDING deep neural network
在线阅读 下载PDF
APPROXIMATE QUERY AND CALCULATION OF RNN_k BASED ON VORONOI CELL 被引量:1
2
作者 郝忠孝 李博涵 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2009年第2期154-161,共8页
Reverse k nearest neighbor (RNNk) is a generalization of the reverse nearest neighbor problem and receives increasing attention recently in the spatial data index and query. RNNk query is to retrieve all the data po... Reverse k nearest neighbor (RNNk) is a generalization of the reverse nearest neighbor problem and receives increasing attention recently in the spatial data index and query. RNNk query is to retrieve all the data points which use a query point as one of their k nearest neighbors. To answer the RNNk of queries efficiently, the properties of the Voronoi cell and the space-dividing regions are applied. The RNNk of the given point can be found without computing its nearest neighbors every time by using the rank Voronoi cell. With the elementary RNNk query result, the candidate data points of reverse nearest neighbors can he further limited by the approximation with sweepline and the partial extension of query region Q. The approximate minimum average distance (AMAD) can be calculated by the approximate RNNk without the restriction of k. Experimental results indicate the efficiency and the effectiveness of the algorithm and the approximate method in three varied data distribution spaces. The approximate query and the calculation method with the high precision and the accurate recall are obtained by filtrating data and pruning the search space. 展开更多
关键词 computational geometry approximation query filtrating reverse k nearest neighbor (RNNk) Voronoi cell
在线阅读 下载PDF
AbIx: An Approach to Content-Based Approximate Query Processing in Peer-to-Peer Data Systems
3
作者 王朝坤 王建民 +2 位作者 孙家广 石胜飞 高宏 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第2期280-286,共7页
In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P dat... In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P data systems. The autonomy of peers also is not considered enough. In addition, the system cost is very high because the information publishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx) are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems. It can be used to search as few peers as possible but get as many returns satisfying users' queries as possible on the guarantee of high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and support very frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, a simulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results show that abstract indices work well in various P2P data systems. 展开更多
关键词 approximate query processing content-based information retrieval peer-to-peer data systems abstract indices
原文传递
Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows 被引量:3
4
作者 沈鸿 张育 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第6期973-987,共15页
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, a... Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results. 展开更多
关键词 data stream duplicate detection bloom filter approximate query sliding window
原文传递
Minimum Epsilon-Kernel Computation for Large-Scale Data Processing
5
作者 Hong-Jie Guo Jian-Zhong Li Hong Gao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第6期1398-1411,共14页
Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with... Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with a provable approximate ratio.It is widely used in geometric optimization,clustering,and approximate query processing,etc.,for scaling them up to massive data.In this paper,we focus on the minimumε-kernel(MK)computation that asks for a kernel of the smallest size for large-scale data processing.For the open problem presented by Wang et al.that whether the minimumε-coreset(MC)problem and the MK problem can be reduced to each other,we first formalize the MK problem and analyze its complexity.Due to the NP-hardness of the MK problem in three or higher dimensions,an approximate algorithm,namely Set Cover-Based Minimumε-Kernel algorithm(SCMK),is developed to solve it.We prove that the MC problem and the MK problem can be Turing-reduced to each other.Then,we discuss the update of MK under insertion and deletion operations,respectively.Finally,a randomized algorithm,called the Randomized Algorithm of Set Cover-Based Minimumε-Kernel algorithm(RA-SCMK),is utilized to further reduce the complexity of SCMK.The efficiency and effectiveness of SCMK and RA-SCMK are verified by experimental results on real-world and synthetic datasets.Experiments show that the kernel sizes of SCMK are 2x and 17.6x smaller than those of an ANN-based method on real-world and synthetic datasets,respectively.The speedup ratio of SCMK over the ANN-based method is 5.67 on synthetic datasets.RA-SCMK runs up to three times faster than SCMK on synthetic datasets. 展开更多
关键词 approximate query processing KERNEL large-scale dataset NP-HARD
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部