Web信息检索结构化排序函数与标引词加权技术被引量：1

Survey on structured ranking function and term weighting technology of Web information retrieval

下载PDF

导出

摘要分析了当前Web信息检索的技术现状,指出检索效率不高的根本原因在于搜索引擎所采用的排序函数和标引词加权技术。介绍了传统的信息检索排序函数和标引词加权技术。分析了Web文档的特点,指出其主要形式HTML文档是一种结构化文档,结构由标签显式地定义,不同文档结构对检索性能的贡献不同。对本领域国内外学者的成果作了对比研究。最后探讨了Web信息检索排序函数及标引词加权技术的发展方向。 This paper analyzes current technological status of Web Information Retrieval（IR） and points out the root of its inefficiency is the ranking function and term weighting algorithms that searching engine adopts.Then classic IR ranking function and term weighting technologies are introduced.Characters of Web documents are studied,the fact is most of them are HTML documents,a kind of structured documents.Its structure is defined explicitly by predefined HTML tags,which has different importance and influence on the performance of search engine.The studies of researchers on structures of HTML documents are introduced,that is,making use of the peculiarity of Web documents to extend classic ranking function and term weighting technology to a structured one.Finally we discuss development trend of these technologies mentioned above.

作者赵正文康耀红

机构地区海南大学信息科学技术学院通信与信息系统重点实验室

出处《计算机工程与应用》 CSCD 北大核心 2007年第11期181-184,共4页 Computer Engineering and Applications

基金国家教育部科学技术重点研究项目(the Key Technologies Project of the Ministry of Education of China No.03144) 海南省自然科学基金(the Natural Science Foundation of Hainan Province of China under Grant No.60533)。

关键词排序函数标引词加权文档结构搜索引擎 ranking function term weighting document structure search engine

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1Gordon M,Pathak P.Finding information on the world wide web:the retrieval effectiveness of search engines[J].Information Processing and Management,1999,35(2):141-180.
2Kleinberg J M.Authoritative sources in a hyperlinked environment[C]//Proceedings of the ACM-SLAM Symposium on Discrete Algorithms,1998:668-677.
3Cutler M,Shih Yung-ming,Meng Wei-yi.Using the structure of HTML documents to improve retfieval[C]//Proceedings of the 11st IEEE Conference on Tools with AI,1999:406-409.
4Kim S,Zhang B T.Genetic mining of HTML structures for effective web-document retrieval[J].Applied Intellingences,2003(18):243-256.
5张敏,马少平,宋睿华.DF还是IDF?主特征模型在Web信息检索中的使用[J].软件学报,2005,16(5):1012-1020. 被引量：13
6Trotman A.Choosing document structure weights[J].Information Processing and Management,2005,41:243-264.
7Newby G B,ChapelHill UNC.Information space based on HTML structures[EB/OL].http://www.ils.unc.edu/-gbnewby/papers/trec9-proceedings.pdf.
8刘芳,卢正鼎.有效地检索HTML文档[J].小型微型计算机系统,2000,21(9):986-988. 被引量：23
9韩毅.基于文档结构的向量空间检索模型研究[J].情报学报,2004,23(2):158-162. 被引量：11

二级参考文献24

1上海交大远程教育中心，HTML 语言参考 .WWW书籍，1998年
2Anick PG. Adapting a full-text information retrieval system to computer the troubleshooting domain. In: Croft WB, van Rijsbergen CJ, eds. Proc. of the 17th Annual Int'l ACM-SIGIR Conf. on Research and Development in Information Retrieval (SIGIR'94).ACM Press, 1994. 349-358.
3Croft WB, Cook R, Wilder D. Providing government information on the Internet: Experience with THOMAS. In: Proc. of the 2nd Int'l Conf. in Theory and Practice of Digital Libraries (DL'95). Texas, 1995. 19-24. http://csdl.tamu.edu/DL95/papers/croft/croft.html
4Stefan K, Armin H, Markus J, Andreas D. Improving document retrieval by automatic query expansion using collaborative learning of term-based concepts. Lecture Notes in Computer Science 2423, 2002. 376-387.
5Moffat A, Davis R, Wilkinson R, Zobel J. Retrieval of partial documents. In: Harman D, ed. Proc. of the 2nd Text Retrieval Conf.(TREC 2). Gaithersburg: National Institute of Standards and Technology Special Publication, 1994. 181-191.
6Srinivasa S, Bhatt PCP. Introduction to Web information retrieval: A user perspective. Journal of Science Education, 2002,7(6):27-38.
7Meng M, Yu C, Liu KL. Building efficient and effective metasearch engines. ACM Computing Surveys, 2002,34(1):48-89.
8Glover E, Tsioutsiouliklis K, Lawrence S, Pennock D, Flake G. Using Web structure for classifying and describing Web pages. In:Proc. of the Int'l World Wide Web Conf. (www 2002). Hawaii: ACM Press, 2002. 562-569. http://www2002.org/CDROM/refereed/504/index.html
9Cutler M, Shih Y, Meng W. Using the structure of HTML documents to improve retrieval. In: Proc. of the USENIX Symp. on Internet Technologies and Systems (NISTS'97). 1997. 241-251. http://www.usenix.org/publications/library/proceedings/usits97/full_papers/cutler/cutler.pdf
10Newby GB. Information space based on HTML structure. In: Vorhees E, ed. Proc. of the 9th Text Retrieval Conf. (TREC 9).Gaithersburg: National Institute of Standards and Technology Special Publication, 2000. 601-610.

共引文献44

1刘海峰,姚泽清,汪泽焱,张学仁.基于位置的文本特征加权方法研究[J].微电子学与计算机,2009,26(2):188-192. 被引量：9
2钟敏娟,林亚平,陈治平.基于超链接和标记文本的信息检索算法[J].小型微型计算机系统,2004,25(7):1344-1347. 被引量：7
3刘志为,何丕廉,孙越恒,郑小慎.N层向量空间模型在Web信息检索中的应用[J].微型机与应用,2004,23(12):60-62. 被引量：5
4夏咏梅.基于文本挖掘的分类与聚类技术[J].情报探索,2005(3):65-67. 被引量：10
5胡健,陆一鸣,马范援.基于HTML文档结构的向量空间模型的改进[J].情报学报,2005,24(4):433-437. 被引量：10
6胡敏,杨红,戴玉刚.基于XML的向量空间模型在数字图书馆检索中的应用研究[J].福建电脑,2005,21(11):1-1.
7魏振达,阳小华,刘军.成员搜索引擎的查询参数表达能力的建模设计[J].南华大学学报（自然科学版）,2005,19(4):83-85.
8孙静.基于Dublin Core元数据的检索方法比较[J].中国科技信息,2006(4):35-35.
9张晓辉,何丕廉,刘光然.智能答疑系统中搜索技术的研究[J].微计算机应用,2006,27(3):261-263. 被引量：3
10刘慧,马军,雷景生,连莉.基于特征域词频的邮件过滤方法的研究[J].山东大学学报（理学版）,2006,41(3):134-138. 被引量：1

同被引文献5

1陈可期,黄云森,徐明.教学资源库建设策略研究[J].中山大学学报（自然科学版）,2002,41(z1):114-117. 被引量：20
2何勇,陈世平.基于Web Service的校园数据共享的设计与实现[J].计算机应用与软件,2005,22(10):64-66. 被引量：11
3钟珞,王辉,李锐弢,宋华珠.基于语义Web的网络学习资源库本体实现[J].计算机工程,2007,33(8):282-284. 被引量：16
4胡方霞,曾一,高旻.Web Services技术应用与探讨[J].计算机科学,2007,34(3):75-77. 被引量：33
5吴立明,吴杰,钟亦平,张世永.基于Web Service的网络教育系统集成方案的设计与实现[J].计算机应用与软件,2007,24(7):39-42. 被引量：4

引证文献1

1惠建新,周杰,张红卫,张恒,张唯希.基于Web Service的多语种语言资源库管理系统设计[J].计算机应用与软件,2010,27(10):58-61. 被引量：3

二级引证文献3

1刘红坤.IP地址管理系统的设计与实现[J].计算机工程,2011,37(S1):362-364. 被引量：1
2惠建新.专业数据库平台分析专利情报应用研究[J].现代信息科技,2021,5(17):123-126. 被引量：2
3杨立鹏,郝晓培,易超,段然,王思宇.铁路12306互联网售票系统多语言服务方案及自动翻译模型研究[J].铁道运输与经济,2023,45(10):35-41.

1章伟雄.关于启发式搜索中加权技术有效性的一个注记[J].上海交通大学学报,1989,23(5):109-112.
2金春霞,周海岩.基于机器学习的Web文本分类技术及算法[J].长春工业大学学报,2009,30(3):347-351. 被引量：3
3程传鹏,夏敏捷.微博自动标引关键技术的研究[J].计算机工程与应用,2011,47(34):137-140. 被引量：5
4刘金花,张友华,李绍稳,朱麟,胡艺峰,谢静.基于用户检索和加权技术的领域本体演化研究[J].数字技术与应用,2011,29(1):61-63.
5赵英男,杨静宇.基于Gabor滤波器和SVM分类器的红外车辆检测[J].计算机工程,2005,31(10):191-192. 被引量：7
6王星,刘伟.基于引文的中文学术文献自动标引方法研究[J].图书情报工作,2014,58(3):106-110. 被引量：11
7马昌威,邵莉.一种融合本体与粗糙集的文档相似度计算方法[J].计算机与现代化,2012(10):17-20.
8李奕,吴小俊.香农熵加权稀疏表示图像融合方法研究[J].自动化学报,2014,40(8):1819-1835. 被引量：9
9黄震华,向阳,薛永生,赵杠.一种并行处理Skyline查询的有效方法[J].自动化学报,2010,36(7):968-975. 被引量：2
10方清华.信息检索加权理论与技术:基于VSM模型的分析[J].情报杂志,2008,27(6):73-76. 被引量：5

计算机工程与应用

2007年第11期

浏览历史

内容加载中请稍等...

Web信息检索结构化排序函数与标引词加权技术被引量：1

参考文献9

二级参考文献24

共引文献44

同被引文献5

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

Web信息检索结构化排序函数与标引词加权技术 被引量：1

参考文献9

二级参考文献24

共引文献44

同被引文献5

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

Web信息检索结构化排序函数与标引词加权技术被引量：1