摘要
提出了一种基于Apriori思想的挖掘最大频繁访问模式的s-Tree算法。该算法使用有向树表示用户会话,能挖掘出最大前向引用事务和用户的浏览偏爱路径;使用一种基于内容页面优先的支持度计算方法,能挖掘出传统算法不能发现的特定的用户访问模式;使用频繁模式树连接分层的频繁弧克服了图结构数据挖掘算法中直接连接两个频繁模式树要判断连接条件的缺点,同时采用预剪枝策略,降低了算法的开销。实验表明,s-Tree算法具有可扩展性,运行效率比直接采用图结构数据挖掘算法要高。
A novel Apriori-based algorithm named s-Tree was proposed for mining maximum frequent access pattems in Web logs. The main contributions of the novel algorithm were as follows. Firstly, the directed tree was used to represent the user session, which enabled us to mine the maximum forward reference transaction and the users' preferred access path. Secondly, a novel method for counting supporting degree based on content first, which helped us to discover some more important and interesting patterns than normal methods. Thirdly, two special strategies were adopted to reduce the overhead of jointing frequent pattems. Experiment results show that the s-Tree algorithm is scalable, and is more efficient than previous graph-based structure pattem mining algorithms such as AGM( Apriori-based Graph Mining) and FSG( Frequent Subgraph Discovery).
出处
《计算机应用》
CSCD
北大核心
2006年第7期1662-1665,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60373023)