摘要
实体关系抽取是信息抽取领域中的重要研究课题。本文使用两种基于特征向量的机器学习算法 ,Winnow和支持向量机 (SVM) ,在 2 0 0 4年ACE(AutomaticContentExtraction)评测的训练数据上进行实体关系抽取实验。两种算法都进行适当的特征选择 ,当选择每个实体的左右两个词为特征时 ,达到最好的抽取效果 ,Win now和SVM算法的加权平均F Score分别为 73 0 8%和 73 2 7%。可见在使用相同的特征集 ,不同的学习算法进行实体关系的识别时 ,最终性能差别不大。因此使用自动的方法进行实体关系抽取时 ,应当集中精力寻找好的特征。
Entity Relation Extraction is an important research field in Information Extraction. Two kinds of machine learning algorithms, Winnow and Support Vector Machine (SVM), were used to extract entity relation from the training data of ACE (Automatic Content Extraction) Evaluation 2004 automatically. Both of the algorithms need appropriate feature selection. When two words around an entity were selected, the performance of the both algorithms got the peak. The average weighted F Score of Winnow and SVM algorithms were 73 08% and 73 27% respectively. We can conclude that when the same feature set is used, the performance of different machine learning algorithms get little difference. So we should pay more attention to find better features when we use the automatic learning methods to extract the entity relation.
出处
《中文信息学报》
CSCD
北大核心
2005年第2期1-6,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金资助 (6 0 4 35 0 2 0 )
关键词
计算机应用
中文信息处理
实体关系抽取
ACE评测
特征选择
computer application
Chinese information processing
entity relation extraction
ACE evaluation
feature selection