摘要
会话识别是Web日志挖掘中的数据预处理中的一个重要步骤。文中提出了一种改进的会话识别方法。首先,在用户识别后,进行框架页面的过滤,从而大大地减少了实验产生的有效页面,然后为页面设置访问时间阈值,并根据页面内容及站点结构确定的页面重要程度对该阈值进行调整。通过实验证明,相对于传统的对所有页面使用单一的先验阈值进行会话识别的方法,该方法所得到的会话集更具有真实性。
Session identification is an important step in data preproce^ing of web log mining, an access intervals- based improvement was carried out of transaction session identification in web usage mining. After identifying users, effective web pages in experiment are reduced greatly by filtering frame pages, and the access time threshold was adjusted by the web contents and site's structure on this condition. Compared to the traditional method that defines a uniform a threshold for all web pages experimentally, the approach presented can decide the access time threshold more accurately. Algorithm enhancing the quality of transaction session is proved by experiments.
出处
《计算机技术与发展》
2008年第11期214-216,共3页
Computer Technology and Development
基金
安徽省自然科学基金项目(KJ2008B116)
池州学院自然科学基金项目(XK0829)