期刊文献+

一种改进的基于树路径匹配的网页结构相似度算法 被引量:8

An Improved Web Structure Similarity Based on Matching Algorithm of Tree Paths
在线阅读 下载PDF
导出
摘要 提出一种改进的基于树路径匹配的网页结构相似度算法,该算法定义了树路径的序列相似度和位置相似度,找出网页的树路径集合,通过网页间的最佳树路径匹配计算结构相似度.实验结果表明,用改进后的算法计算网页结构相似度比传统树路径匹配方法更符合实际,更合理有效. An improved algorithm of Web structure similarity based on tree path matching was proposed, which defines the sequence similarity and position similarity of the tree path, finds out all the Web tree paths, and calculates the structural similarity by best tree path matching between two Web pages. Experiments show that the proposed algorithm to calculate the Web structure similarity is more realistic and effective than the original algorithm.
出处 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2012年第6期1199-1203,共5页 Journal of Jilin University:Science Edition
基金 国家自然科学基金(批准号:61152001:61170111) 中国科学院自动化研究所复杂系统管理与控制重点实验室开放课题基金(批准号:20110102) 中央高校基本科研业务费专项基金(批准号:SWJTU11ZT08)
关键词 网页结构相似度 序列相似度 位置相似度 Web structure similarity sequence similarity position similarity
  • 相关文献

参考文献11

  • 1Chang M, Kayed R, Girgis F, et al. A Survey of Web Information Extraction Systems[J].IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1411-1428.
  • 2Alvarez M, Pan A, Raposo J, et al. Extracting lasts of Data Records from Semi-structured Web Pages [J]. Data Knowledge Engineering, 2008, 24(2):491-509.
  • 3Yang T, Qiu T. A Method of Automatic Web Information Extraction Based on Page Clustering [C]//Proceeding of the 9th World Congress on Intelligent Control and Automation (WCICA2011). Taipei: IEEE, 2011: 390-393.
  • 4LIU Hong, MA Yin-xiao. Web Data Extraction Research Based on Wrapper and XPath Technology [J].Advanced Materials Research, 2011, 271/272/273:706-721.
  • 5Sahuguet A, Azavant F. Recycling HTML Pages as XML Document Using W4f[C]//ACM SIGMOD Workshop on Database the Web and Databases. New York: ACM, 1999: 31-35.
  • 6TAI Kuo chung. The Tree-to-Tree Correction Problem [J].Journal of the ACM, 1979, 26(3) : 422-433.
  • 7Wang J T L, ZHANG Kai-zhong, Jeong K, et al. A System for Approximate Tree Matching [J].IEEE Transac- tions on Knowledge and Data Engineering, 1994, 6(4) : 559-571.
  • 8张杰,卫金茂,刘丹.基于BFS树的XML文档图结构相似性计算[J].计算机工程与设计,2008,29(17):4603-4605. 被引量:3
  • 9李睿,曾俊瑀,周四望.基于局部标签树匹配的改进网页聚类算法[J].计算机应用,2010,30(3):818-820. 被引量:14
  • 10Cruze I F, Borisov S, Marks M A, et al. Measuring Structural Similarity among Web Documents: Preliminary Results[J]. Computer Science, 1998, 1375: 513-524.

二级参考文献17

  • 1闫利国,贺飞.XM L文档结构相似测度研究[J].计算机应用研究,2006,23(3):44-46. 被引量:4
  • 2潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量:16
  • 3FLORESCU D,LEVY A,MENDELZON A.Database techniques for the world-wide Web:Survey[J].SIGMOD Record,1998,27(3):59-74.
  • 4CRESCENZI V,MECCA G,MERIALDO P.Wrapping-oriented classification of Web pages[C]// Proceedings of the 2002 ACM Symposium on Applied Computing.New York:ACM Press,2002:1108-1112.
  • 5REIS D C,GOLGHER P B,SILVA A S,et al.Automatic Web news extraction using tree edit distance[C]// Proceedings of the 13th International Conference on World Wide Web.New York:ACM Press,2004:502-511.
  • 6ZHAI Y,LIU B.Structured data extraction from the Web based on partial tree alignment[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(12):1614-1628.
  • 7YANG W.Identifying syntactic differences between two programs[J].Software-Practice and Experience,1991,21(7):739-755.
  • 8Yan X, Han J. gSpan: Graph-based substructure pattem mining [C].IEEE ICDM,2002:45-49.
  • 9Elisa B,Giovanna G, Macro M.A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications [J]. Information Systems, 2004,29: 23-46.
  • 10Lee M, Yang L, Hsu W, et al.XClust:clustering XML schemas for effective integration[C].CIKM'02,2002:292-299.

共引文献15

同被引文献66

  • 1何昕,谢志鹏.基于简单树匹配算法的Web页面结构相似性度量[J].计算机研究与发展,2007,44(z3):1-6. 被引量:15
  • 2周汉兵,关昕,马力.功能点度量在软件开发中的应用[J].计算机工程与设计,2006,27(3):525-526. 被引量:11
  • 3陈小兵,张汉煜,骆力明,黄河.SQL注入攻击及其防范检测技术研究[J].计算机工程与应用,2007,43(11):150-152. 被引量:73
  • 4Boehm B W.李师贤译.软件成本估算:COCOMOⅡ模型方法[M].北京:机械工业出版社,2005-04.
  • 5Kathy Schwalbe.IT项目管理[M].5版.杨坤,译.北京:机械工业出版社,2009.
  • 6Li YF,Xie M,Goh TN.A study of mutual information based feature selection for case based reasoning in software cost estimation[J].Expert Systems with Applications,2009(36):5921-5931.
  • 7Li YF,Xie M,Goh TN.A study of project selection and feature weighting for analogy based software cost estimation[J].Journal of Systems and Software,2009,82(2):241-252.
  • 8Jrgensen M.Forecasting of software development work effort:Evidence on expert judgement and formal models[J].International Journal of Forecasting,2007,23(3):449-462.
  • 9TaeHoon Hong,ChangTaek Hyun,HyunSeok Moon.CBRbased cost prediction model-II of the design phase for multifamily housing projects[J].Expert Systems with Applications,2011(38):2797-2808.
  • 10Indrani Balasundaram,Ramaraj E.An efficient technique for detection and prevention of SQL injection attack using ASCII based string matching[J] .Procedia Engineering,2012,30:183-190.

引证文献8

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部