期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
A Mathematical Solution to String Matching for Big Data Linking 被引量:1
1
作者 Kevin McCormack Mary Smyth 《Journal of Statistical Science and Application》 2017年第2期39-55,共17页
This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both t... This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here. 展开更多
关键词 Big Data Data Linking Identity Correlation Approach String Matching Public Sector Datasets dataprivacy.
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部