针对稀疏子图发现问题中使用高维稀疏向量表示网络信息存在的时间和空间消耗大的问题,提出一种基于网络嵌入的稀疏子图发现(TGF)算法。该算法首先通过网络嵌入的方法将网络结构映射到低维空间中,得到节点的低维向量表示;然后定义向量空...针对稀疏子图发现问题中使用高维稀疏向量表示网络信息存在的时间和空间消耗大的问题,提出一种基于网络嵌入的稀疏子图发现(TGF)算法。该算法首先通过网络嵌入的方法将网络结构映射到低维空间中,得到节点的低维向量表示;然后定义向量空间中的稀疏子集发现问题,将稀疏子图发现问题转化为稀疏子集发现问题;迭代搜索局部密度最低的样本点并对其进行扩张,最终找到一个满足条件的最大稀疏子集。实验结果表明,在Synthetic_1000数据集上与TERA(Triangle and Edge Reduction Algorithm)和WK(Weight of K-hop)算法相比,TGF算法的搜索效率是TERA的1 353倍,是WK算法的4倍,并且在k-line、k-triangle和k-density指标上也取得了较优的结果。展开更多
How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learnin...How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The pro- posed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of Web pages from diverse domains show the novel approach's effectiveness.展开更多
文摘针对稀疏子图发现问题中使用高维稀疏向量表示网络信息存在的时间和空间消耗大的问题,提出一种基于网络嵌入的稀疏子图发现(TGF)算法。该算法首先通过网络嵌入的方法将网络结构映射到低维空间中,得到节点的低维向量表示;然后定义向量空间中的稀疏子集发现问题,将稀疏子图发现问题转化为稀疏子集发现问题;迭代搜索局部密度最低的样本点并对其进行扩张,最终找到一个满足条件的最大稀疏子集。实验结果表明,在Synthetic_1000数据集上与TERA(Triangle and Edge Reduction Algorithm)和WK(Weight of K-hop)算法相比,TGF算法的搜索效率是TERA的1 353倍,是WK算法的4倍,并且在k-line、k-triangle和k-density指标上也取得了较优的结果。
基金Supported by the National Defense Pre-ResearchFoundation of China(4110105018)
文摘How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The pro- posed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of Web pages from diverse domains show the novel approach's effectiveness.