摘要
介绍了挖掘理论,包括挖掘定义、挖掘任务、挖掘分类个方面,并简单介绍了实现文本挖掘系统WebWebWebWeb3WebWTMiner (的几个关键技术:分词,特征提取,分类器的设计。在分词中采用了支持首字和二分查找从而提高了分词速度,分类器Web Text Miner)Hash的设计中考虑到的训练算法速度慢的缺点,用近邻法以减少训练样本集中样本的数量,从而大大提高了算法速度。
Firstly, the paper introduces the theory of Web mining, including the definition, the task and the categorization of Web mining. Secondly, it also introduces several pivotal technologies in WTMiner (Web Text Miner), including word segmentation, term extraction and categorization method. In word segmentation, it uses two-way searching and hashing operation by means of the first Chinese character in a string to accelerate its speed.Considering the slow training speed to SVM (support vector machine), it uses K-nearest neighbor SVM to reduce the number of training set, so increase the algorithms speed greatly. ;;;
出处
《计算机工程》
CAS
CSCD
北大核心
2002年第8期141-142,151,共3页
Computer Engineering