期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Efficient graph similarity join for information integration on graphs 被引量:5
1
作者 Yue WANG Hongzhi WANG +1 位作者 Jianzhong LI Hong GAO 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第2期317-329,共13页
Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrati... Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time. 展开更多
关键词 graph similarity join edit distance constraint khop tree based indexing structure conservation boundary filtering
原文传递
String similarity join with different similarity thresholds based on novel indexing techniques 被引量:2
2
作者 Chuitian RONG Yasin N. SILVA Chunqing LI 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第2期307-319,共13页
String similarity join is an essential operation of many applications that need to find all similar string pairs from two given collections. A quantitative way to determine whether two strings are similar is to comput... String similarity join is an essential operation of many applications that need to find all similar string pairs from two given collections. A quantitative way to determine whether two strings are similar is to compute their similarity based on a certain similarity function. The string pairs with similarity above a certain threshold are regarded as results. The current approach to solving the similarity join problem is to use a unique threshold value. There are, however, several scenarios that require the support of multiple thresholds, for instance, when the dataset includes strings of various lengths. In this scenario, longer string pairs typically tolerate much more typos than shorter ones. Therefore, we proposed a so- lution for string similarity joins that supports different simi- larity thresholds in a single operator. In order to support dif- ferent thresholds, we devised two novel indexing techniques: partition based indexing and similarity aware indexing. To utilize the new indices and improve the join performance, we proposed new filtering methods and index probing tech- niques. To the best of our knowledge, this is the first work that addresses this problem. Experimental results on real-world datasets show that our solution performs efficiently while pro- viding a more flexible threshold specification. 展开更多
关键词 similarity join similarity aware index similarity thresholds
原文传递
FrepJoin:an efficient partition-based algorithm for edit similarity join
3
作者 Ji-zhou LUO Sheng-fei SHI +1 位作者 Hong-zhi WANG Jian-zhong LI 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第10期1499-1510,共12页
String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-and... String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets. 展开更多
关键词 String similarity join Edit distance Filter and refine Data partition Combined frequency vectors
原文传递
Indoor Uncertain Semantic Trajectory Similarity Join
4
作者 Hong-Bo Yin Dong-Hua Yang +2 位作者 Kai-Qi Zhang Hong Gao Jian-Zhong Li 《Journal of Computer Science & Technology》 CSCD 2024年第6期1441-1465,共25页
With the widespread deployment of indoor positioning systems, an unprecedented scale of indoor trajectories is being produced. By considering the inherent uncertainties and the text information contained in such an in... With the widespread deployment of indoor positioning systems, an unprecedented scale of indoor trajectories is being produced. By considering the inherent uncertainties and the text information contained in such an indoor trajectory, a new definition named Indoor Uncertain Semantic Trajectory is defined in this paper. In this paper, we focus on a new primitive, yet quite essential query named Indoor Uncertain Semantic Trajectory Similarity Join (IUST-Join for short), which is to match all similar pairs of indoor uncertain semantic trajectories from two sets. IUST-Join targets a number of essential indoor applications. With these applications in mind, we provide a purposeful definition of an indoor uncertain semantic trajectory similarity metric named IUS. To process IUST-Join more efficiently, both an inverted index on indoor uncertain semantic trajectories named 3IST and the first acceleration strategy are proposed to form a filtering-and-verification framework, where most invalid pairs of indoor uncertain semantic trajectories are pruned at quite low computation cost. And based on this filtering-and-verification framework, we present a highly-efficient algorithm named Indoor Uncertain Semantic Trajectory Similarity Join Processing (USP for short). In addition, lots of novel and effective acceleration strategies are proposed and embedded in the USP algorithm. Thanks to these techniques, both the time complexity and the time overhead of the USP algorithm are further reduced. The results of extensive experiments demonstrate the superior performance of the proposed work. 展开更多
关键词 filtering-and-verification framework indoor uncertain semantic trajectory inverted index trajectory similarity join
原文传递
Refill friction stir spot welding (RFSSW): a review of processing, similar/dissimilar materials joining, mechanical properties and fracture mechanism
5
作者 Yun-qiang Zhao Hao-kun Yang +2 位作者 Aloshyn Andriia Hong-hang Lo Jia-xin Li 《Journal of Iron and Steel Research International》 SCIE EI CAS CSCD 2024年第8期1825-1839,共15页
Refill friction stir spot welding(RFSSW)provides a novel method to join similar and/or dissimilar metallic materials without a key-hole in the center of the joint.Having the key-hole free characterization,the similar/... Refill friction stir spot welding(RFSSW)provides a novel method to join similar and/or dissimilar metallic materials without a key-hole in the center of the joint.Having the key-hole free characterization,the similar/dissimilar RFSSW joint exhibits remarkable and endurable characteristics,including high shear strength,long fatigue life,and strong corrosion resistance.In the meanwhile,as the key-hole free joint has different microstructures compared with conventional friction stir spot welding,thus the RFSSW joint shall possess different shear and fatigue fracture mechanisms,which needs further investigation.To explore the underlying failure mechanism,the similar/dissimilar metallic material joining parameters and pre-treatment,mechanical properties,as well as fracture mechanisms under this novel technology will be discussed.In details,the welding tool design,welding parameters setting,and the influence of processing on the lap shear and fatigue properties,as well as the corrosion resistance will be mainly discussed.Moreover,the roadmap of RFFSW is also discussed. 展开更多
关键词 Refill friction stir spot welding PROCESSING MECHANISM Similar materials joining Dissimilar materials joining MICROSTRUCTURE Mechanical property Fracture mechanism
原文传递
String similarity search and join: a survey 被引量:4
6
作者 Minghe YU Guoliang LI +1 位作者 Dong DENG Jianhua FENG 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第3期399-417,共19页
String similarity search and join are two impor- tant operations in data cleaning and integration, which ex- tend traditional exact search and exact join operations in databases by tolerating the errors and inconsiste... String similarity search and join are two impor- tant operations in data cleaning and integration, which ex- tend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage clustering. Although these two problems have been exten- sively studied in the recent decade, there is no thorough sur- vey. In this paper, we present a comprehensive survey on string similarity search and join. We first give the problem definitions and introduce widely-used similarity functions to quantify the similarity. We then present an extensive set of algorithms for siring similarity search and join. We also dis- cuss their variants, including approximate entity extraction, type-ahead search, and approximate substring matching. Fi- nally, we provide some open datasets and summarize some research challenges and open problems. 展开更多
关键词 string similarity similarity search similarity join TOP-K
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部