期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Topological Features Based Entity Disambiguation 被引量:1
1
作者 Chen-Chen Sun de-rong shen +2 位作者 Yue Kou Tie-Zheng Nie Ge Yu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第5期1053-1068,共16页
This work proposes an unsupervised topological features based entity disambiguation solution.Most existing studies leverage semantic information to resolve ambiguous references.However,the semantic information is not ... This work proposes an unsupervised topological features based entity disambiguation solution.Most existing studies leverage semantic information to resolve ambiguous references.However,the semantic information is not always accessible because of privacy or is too expensive to access.We consider the problem in a setting that only relationships between references are available.A structure similarity algorithm via random walk with restarts is proposed to measure the similarity of references.The disambiguation is regarded as a clustering problem and a family of graph walk based clustering algorithms are brought to group ambiguous references.We evaluate our solution extensively on two real datasets and show its advantage over two state-of-the-art approaches in accuracy. 展开更多
关键词 entity disambiguation topological feature CLUSTERING random walk with restarts
原文传递
Finding Communities by Decomposing and Embedding Heterogeneous Information Network
2
作者 Yue Kou de-rong shen +2 位作者 Dong Li Tie-Zheng Nie Ge Yu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第2期320-337,共18页
Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in ... Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in the content data.In order to solve this issue,in this paper,we present a community discovery method based on heterogeneous information network decomposition and embedding.Unlike traditional methods,our method takes into account topology,node content and edge content,which can supply abundant evidence for community discovery.First,an embedding-based similarity evaluation method is proposed,which decomposes the heterogeneous information network into several subnetworks,and extracts their potential deep representation to evaluate the similarities between nodes.Second,a bottom-up community discovery algorithm is proposed.Via leader nodes selection,initial community generation,and community expansion,communities can be found more efficiently.Third,some incremental maintenance strategies for the changes of networks are proposed.We conduct experimental studies based on three real-world social networks.Experiments demonstrate the effectiveness and the efficiency of our proposed method.Compared with the traditional methods,our method improves normalized mutual information(NMI)and the modularity by an average of 12%and 37%respectively. 展开更多
关键词 COMMUNITY DISCOVERY HETEROGENEOUS information network decomposition EMBEDDING INCREMENTAL maintenance
原文传递
Mixed Hierarchical Networks for Deep Entity Matching
3
作者 Chen-Chen Sun de-rong shen 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第4期822-838,共17页
Entity matching is a fundamental problem of data integration.It groups records according to underlying real-world entities.There is a growing trend of entity matching via deep learning techniques.We design mixed hiera... Entity matching is a fundamental problem of data integration.It groups records according to underlying real-world entities.There is a growing trend of entity matching via deep learning techniques.We design mixed hierarchical deep neural networks(MHN)for entity matching,exploiting semantics from different abstract levels in the record internal hierarchy.A family of attention mechanisms is utilized in different periods of entity matching.Self-attention focuses on internal dependency,inter-attention targets at alignments,and multi-perspective weight attention is devoted to importance discrimination.Especially,hybrid soft token alignment is proposed to address corrupted data.Attribute order is for the first time considered in deep entity matching.Then,to reduce utilization of labeled training data,we propose an adversarial domain adaption approach(DA-MHN)to transfer matching knowledge between different entity matching tasks by maximizing classifier discrepancy.Finally,we conduct comprehensive experimental evaluations on 10 datasets(seven for MHN and three for DA-MHN),which illustrate our two proposed approaches1 superiorities.MHN apparently outperforms previous studies in accuracy,and also each component of MHN is tested.DA-MHN greatly surpasses existing studies in transferability. 展开更多
关键词 entity matching attention mechanism mixed hierarchical neural network(MHN) domain adaption data integration
原文传递
Content-Related Repairing of Inconsistencies in Distributed Data
4
作者 Yue-Feng Du de-rong shen +2 位作者 Tie-Zheng Nie Yue Kou Ge Yu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期741-758,共18页
Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-re... Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive: 1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new incon- sistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets. 展开更多
关键词 data quality management distributed consistency content relativity consistency repairing
原文传递
Incremental User Identification Across Social Networks Based on User-Guider Similarity Index
5
作者 Yue Kou Dong Li +2 位作者 de-rong shen Tie-Zheng Nie Ge Yu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第5期1086-1104,共19页
Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions.However,existing techniques rely on given user seeds and ignore the dynamic changes of onlin... Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions.However,existing techniques rely on given user seeds and ignore the dynamic changes of online social networks,which fails to generate high quality identification results.In order to solve this problem,we propose an incremental user identification method based on user-guider similarity index(called CURIOUS),which efficiently identifies users and well captures the changes of user features over time.Specifically,we first construct a novel user-guider similarity index(called USI)to speed up the matching between users.Second we propose a two-phase user identification strategy consisting of USI-based bidirectional user matching and seed-based user matching,which is effective even for incomplete networks.Finally,we propose incremental maintenance for both USI and the identification results,which dynamically captures the instant states of social networks.We conduct experimental studies based on three real-world social networks.The experiments demonstrate the effectiveness and the efficiency of our proposed method in comparison with traditional methods.Compared with the traditional methods,our method improves precision,recall and rank score by an average of 0.19,0.16 and 0.09 respectively,and reduces the time cost by an average of 81%. 展开更多
关键词 user identification social network user-guider similarity index incremental maintenance
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部