As background knowledge of geographic information retrieval(GIR),the gazetteers have their limitations.In this paper we propose to develop and implement a com-mon sense geographic knowledge base(CSGKB)instead of the g...As background knowledge of geographic information retrieval(GIR),the gazetteers have their limitations.In this paper we propose to develop and implement a com-mon sense geographic knowledge base(CSGKB)instead of the gazetteers.We define that CSGKB is concerned with the representation of geographic knowledge in human brain and the simulation of geographic reasoning in daily life.Traditional geographic information system(GIS)is based on the model of map with its data based on geographic coordinates and its computation based on geometry.However,CSGKB,which is made up of geographic features and relationships and is based on qualitative spatio-temporal reasoning,can be viewed as the direct model of geographic world.This paper also discusses the characters of CSGKB and pre-sents its structure which is composed of knowledge base,inference engine,geo-graphic ontology and learner.The applications using CSGKB include geographic information retrieval(GIR),natural language processing(NLP),named entity rec-ognition(NER),Semantic Web,etc.At present,our work focuses on the design of geographic ontology and the implementation of the CSGKB knowledge base.In this paper we describe the CSGKB ontology structure,top ontology,geographic loca-tion ontology,spatial relationship ontology,and domain ontologies.Finally,we in-troduce the current state of implementation of CSGKB and give an outlook on our future researches.展开更多
Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this articl...Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this article,we present the results of a wide-ranging evaluation on the performance of different string similarity metrics over the toponym matching task.We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics,which has the natural advantage of avoiding the manual tuning of similarity thresholds.Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small,and that carefully tuning the similarity threshold is important for achieving good results.The methods based on supervised machine learning,particularly when considering ensembles of decision trees,can achieve good results on this task,significantly outperforming the individual similarity metrics.展开更多
基金the National Natural Science Foundation of China(Grant No.40701134)the National Hi-Tech Research and Development Program of China(Grant No.2007AA12Z216)
文摘As background knowledge of geographic information retrieval(GIR),the gazetteers have their limitations.In this paper we propose to develop and implement a com-mon sense geographic knowledge base(CSGKB)instead of the gazetteers.We define that CSGKB is concerned with the representation of geographic knowledge in human brain and the simulation of geographic reasoning in daily life.Traditional geographic information system(GIS)is based on the model of map with its data based on geographic coordinates and its computation based on geometry.However,CSGKB,which is made up of geographic features and relationships and is based on qualitative spatio-temporal reasoning,can be viewed as the direct model of geographic world.This paper also discusses the characters of CSGKB and pre-sents its structure which is composed of knowledge base,inference engine,geo-graphic ontology and learner.The applications using CSGKB include geographic information retrieval(GIR),natural language processing(NLP),named entity rec-ognition(NER),Semantic Web,etc.At present,our work focuses on the design of geographic ontology and the implementation of the CSGKB knowledge base.In this paper we describe the CSGKB ontology structure,top ontology,geographic loca-tion ontology,spatial relationship ontology,and domain ontologies.Finally,we in-troduce the current state of implementation of CSGKB and give an outlook on our future researches.
基金the Trans-Atlantic Platform for the Social Sciences and Humanities,through the Digging into Data project with reference HJ-253525also through the Reassembling the Republic of Letters networking programme(EU COST Action IS1310)+1 种基金The researchers from INESC-ID also had financial support from Fundação para a Ciência e a Tecnologia(FCT),through project grants with references PTDC/EEI-SCR/1743/2014(Saturn)CMUP-ERI/TIC/0046/2014(GoLocal),as well as through the INESC-ID multi-annual funding from the PIDDAC programme(UID/CEC/50021/2013).
文摘Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this article,we present the results of a wide-ranging evaluation on the performance of different string similarity metrics over the toponym matching task.We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics,which has the natural advantage of avoiding the manual tuning of similarity thresholds.Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small,and that carefully tuning the similarity threshold is important for achieving good results.The methods based on supervised machine learning,particularly when considering ensembles of decision trees,can achieve good results on this task,significantly outperforming the individual similarity metrics.