摘要
Web使用挖掘是将数据挖掘技术应用于用户浏览Web时交互作用的二次数据以发现使用模式,从而达到更好地理解和服务基于Web应用的需要的目的。在将数据挖掘算法应用于从服务器日志收集来的数据之前必须要进行一些预处理工作。数据预处理就是把源数据转换为下一步应用数据挖掘算法所必须的数据抽象的过程。作为模式发现的数据源,数据预处理结果的质量直接影响着模式发现的最终结果。本文提出了几种可用于提高数据预处理性能的技术和方法。实验证明,这些技术和方法是有效的。最后,总结全文并提出了进一步的研究方向。
Web usage mining is the application of data mining techniques to discover usage patterns from the secondary data derived from the interactions of the users while surfing on the Web, in order to understand and better serve the needs of Web-based applications. There are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from server logs. Data preprocessing is the process to convert the raw data into the data abstraction necessary for the further applying' the data mining algorithm. As the data sources of patterns discovery,the results' quality of data preprocessing influences the results of patterns discovery directly. This paper presents several data preparation techniques and methods that can be used to improve the performance of data preprocessing in order to identify unique users and user sessions. These techniques and methods have been proved valid and efficient by experiments. Finally, we conclude this paper and propose the future research directions.
出处
《电子测量技术》
2007年第3期3-5,共3页
Electronic Measurement Technology
基金
湖北省科技攻关项目(2005101C18)
中南民族大学自然科学基金项目