Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Proj...Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.展开更多
As the problems of conceptual and representational differences will arise among multi-representations, in- ter-connectivity maintenance among multi-representations exists as a foundational task in building multi-scale...As the problems of conceptual and representational differences will arise among multi-representations, in- ter-connectivity maintenance among multi-representations exists as a foundational task in building multi-scale data model. Since the existing methods are still not satisfactory in practice, the inter-connectivity among multiple representa- tions can be only achieved if the multi-scale model is capable of explicitly inter-relating them and dealing with their differences. So, this paper firstly explores the relation among multiple representations from the same entity, such as multi-semantic, multi-geometry, multi-attributes, hierarchical semantic relations and so on. Based on these, this paper proposes aggregation-based semantic hierarchical matching rules (ASHMR) as the basis of tackling inter-connectivity among multi-representations, and defines the available hierarchical semantic knowledge, namely semantically equal, semantically related and semantically irrelevant. According to different change among multi-representations from dif- ferent types of objects, the applications and techniques of the corresponding hierarchy inter-connectivity matching crite- rion are explored. And taken the road intersections as examples, a case in point is given in details for describing the strategies of inter-connectivity maintenance, showing that this method is feasible to deal with inter-connectivity.展开更多
With growing demand on multi-purpose or multi-modal navigation,the route calculation needs to traverse semantically enriched road networks for different transportation modes.Currently,operational route planning algori...With growing demand on multi-purpose or multi-modal navigation,the route calculation needs to traverse semantically enriched road networks for different transportation modes.Currently,operational route planning algorithms reveal rather limited performances or their potential for comprehensive applications are constrained by the unavailable or insufficient interoperation among the under-lying geo-data that are separately maintained in different spatial databases.To overcome this limitation,a novel approach has been proposed to integrate the routing-relevant information from different data sources,which involves three processes:(1)automatic matching to identify the corresponding road objects between different datasets;(2)interaction to refine the automatic matching result;and(3)transferring the routing-relevant information from one data-set to another.In process(1),the Delimited Stroke Oriented algorithm is employed to achieve the automatic data matching between different datasets,which has revealed a high matching rate and certainty.However uncertain matching problems occur in areas where topological conditions are too complicated or inconsistent.The remaining unmatched or wrongly matched objects are treated in process(2),with the help of a series of interaction tools.On the basis of refined matching results after the interaction,process(3)is dedicated to automatic integration of the routing-relevant information from different data sources.展开更多
文摘Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.
基金Project 40471090 supported by the National Natural Science Foundation of China, and 2006-1 by the Open Foundation from Key Lab of Resource Envi-ronment and GIS, Beijing City, China
文摘As the problems of conceptual and representational differences will arise among multi-representations, in- ter-connectivity maintenance among multi-representations exists as a foundational task in building multi-scale data model. Since the existing methods are still not satisfactory in practice, the inter-connectivity among multiple representa- tions can be only achieved if the multi-scale model is capable of explicitly inter-relating them and dealing with their differences. So, this paper firstly explores the relation among multiple representations from the same entity, such as multi-semantic, multi-geometry, multi-attributes, hierarchical semantic relations and so on. Based on these, this paper proposes aggregation-based semantic hierarchical matching rules (ASHMR) as the basis of tackling inter-connectivity among multi-representations, and defines the available hierarchical semantic knowledge, namely semantically equal, semantically related and semantically irrelevant. According to different change among multi-representations from dif- ferent types of objects, the applications and techniques of the corresponding hierarchy inter-connectivity matching crite- rion are explored. And taken the road intersections as examples, a case in point is given in details for describing the strategies of inter-connectivity maintenance, showing that this method is feasible to deal with inter-connectivity.
文摘With growing demand on multi-purpose or multi-modal navigation,the route calculation needs to traverse semantically enriched road networks for different transportation modes.Currently,operational route planning algorithms reveal rather limited performances or their potential for comprehensive applications are constrained by the unavailable or insufficient interoperation among the under-lying geo-data that are separately maintained in different spatial databases.To overcome this limitation,a novel approach has been proposed to integrate the routing-relevant information from different data sources,which involves three processes:(1)automatic matching to identify the corresponding road objects between different datasets;(2)interaction to refine the automatic matching result;and(3)transferring the routing-relevant information from one data-set to another.In process(1),the Delimited Stroke Oriented algorithm is employed to achieve the automatic data matching between different datasets,which has revealed a high matching rate and certainty.However uncertain matching problems occur in areas where topological conditions are too complicated or inconsistent.The remaining unmatched or wrongly matched objects are treated in process(2),with the help of a series of interaction tools.On the basis of refined matching results after the interaction,process(3)is dedicated to automatic integration of the routing-relevant information from different data sources.