摘要
针对Web数据采集技术进行了介绍,分析了Web数据采集技术在将非结构化数据转换为结构化数据方面的优势:速度快、准确性高。从HTTP协议层分析了Web数据抓取的原理,并重点介绍了如何实现基于Python的Web数据采集方案。Web数据采集系统可以分为:HTTP交互和数据解析两个模块。
In this paper web scraping technologies are discussed.The advantages of Web data collection technology for high speed and accuracy conversion of unstructured data into structured data are pointed out.The principles of the web scraping at HTTP level are introduced with emphasis on the technical solutions to Python-based web scraping.Web scraping system consists of two modules:HTTP interaction module and data analysis module.
出处
《电子科技》
2012年第11期118-120,共3页
Electronic Science and Technology