JDAIP  Vol.3 No.4 , November 2015
Conditions of Non-Unique Identifiers in Record Linkage Using Japanese Cohort Dataset
Abstract: The applications of unique identifiers such as name, home address and social security number to link different datasets have been commonly used and well-published. Also, the theoretical concepts of probabilistic algorithm in record linkage have been well-defined in the literature. However, few studies have reported the applications of its probabilistic algorithm using non-unique identifiers. In this paper, we investigate several variables (weight, height, waist, age, sex, smoking and alcohol habit) as non-unique identifiers using Japanese cohort dataset with three-year baseline of 1989-1991 to observe how effectively these identifiers can be used and what influence those may have on record linkage. Moreover, we modify the conditions of these identifiers and estimate the sensitivity, specificity and accuracy for comparison. We further investigate this by using extended ten-year baseline of 1989-1999 as well. As a result, we conclude that the combination of age, sex, weight and height predicts better estimation with regards to the sensitivity, specificity and accuracy than other combinations in both men and women in case of using three-year baseline, whereas the combination of age, sex and height predicts better in both men and women in case of using ten-year baseline.
Cite this paper: Nakai, M. , Nishimura, K. and Miyamoto, Y. (2015) Conditions of Non-Unique Identifiers in Record Linkage Using Japanese Cohort Dataset. Journal of Data Analysis and Information Processing, 3, 103-111. doi: 10.4236/jdaip.2015.34011.

[1]   Salvador, A. and Ikeda, A. (2014) Big Data Usage in the Marketing Information System. Journal of Data Analysis and Information Processing, 2, 77-85.

[2]   Baldwin, E., Johnson, K., Berthoud, H. and Dublin, S. (2015) Linking Mothers and Infants within Electronic Health Records: A Comparison of Deterministic and Probabilistic Algorithms. Pharmacoepidemiology and Drug Safety, 24, 45-51.

[3]   Setoguchi, S., Glynn, R.J., Avorn, J., Mogun, H. and Schneeweiss, S. (2007) Stains and the Risk of Lung, Breast, and Colorectal Cancer in the Elderly. Circulation, 115, 27-33.

[4]   Okamoto, E. (2014) Linkage Rate between Data from Health Checks and Health Insurance Claims in the Japan National Database. Journal of Epidemiology, 24, 77-83.

[5]   Sengoku, T., Matsumura, K., Usami, M., Takahashi, Y. and Nakayama, T. (2014) Diagnostic Accuracy of FDG-PET Cancer Screening in Asymptomatic Individuals: Use of Record Linkage from the Osaka Cancer Registry. International Journal of Clinical Oncology, 19, 989-997.

[6]   Winkler, W.E. (2009) Should Social Security Numbers Be Replaced by Modern, More Secure Identifiers? Proceedings of the National Academy of Sciences of the United States of America, 106, 10877-10878.

[7]   Jaro, M.A. (1989) Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association, 84, 414-420.

[8]   Okamura, T., Kokubo, Y., Watanabe, M., Higashiyama, A., Ono, Y., Miyamoto, Y., Yoshimasa, Y. and Okayama, A. (2009) Triglycerides and Non-High-Density Lipoprotein Cholesterol and the Incidence of Cardiovascular Disease in an Urban Japanese Cohort: The Suita Study. Atherosclerosis, 209, 290-294.

[9]   Furukawa, Y., Kokubo, Y., Okamura, T., Watanabe, M., Higashiyama, A., Ono, Y., Kawanishi, K. and Okayama, A. (2010) The Relationship between Waist Circumference and the Risk of Stroke and Myocardial Infarction in a Japanese Urban Cohort: The Suita Study. Stroke, 41, 550-553.

[10]   Watanabe, M., Kokubo, Y., Higashiyama, A., Ono, Y. and Miyamoto, Y. (2011) 5-Anhydro-D-glucitol Levels Predict First-Ever Cardiovascular Disease: An 11-Year Population-Based Cohort Study in Japan, the Suita Study. Atherosclerosis, 203, 587-592.

[11]   Nishimura, K., Okamura, T., Watanabe, M., Nakai, M., Takegami, M., Higashiyama, A., Kokubo, Y., Okayama, A. and Miyamoto, Y. (2014) Predicting Coronary Heart Disease Using Risk Factor Categories for a Japanese Urban Population, and Comparison with the Framingham Risk Score: The Suita Study. Journal of Atherosclerosis and Thrombosis, 21, 784-798.

[12]   Fellegi, I.P. and Sunter, A.B. (1969) A Theory for Record Linkage. Journal of the American Statistical Association, 64, 1183-1210.

[13]   Grannis, S.J., Overhage, J.M., Hui, S. and McDonald, C.J. (2003) Analysis of a Probabilistic Record Linkage Technique without Human Review. AMIA Annual Symposium Proceedings, 2003, 259-263.

[14]   Clark, D.E. (2004) Practical Introduction to Record Linkage for Injury Research. Injury Prevention, 10, 186-191.

[15]   Wright, G. (2011) Probabilistic Record Linkage in SAS. Proceedings of Western Users of SAS Software, Section of Data Capture, Validation, Manipulation, & Integration, San Francisco, California, 12-14 October 2011, 1-13.

[16]   Moore, C.L., Amin, J., Gidding, H.F. and Law, M.G. (2014) A New Method for Assessing How Sensitivity and Specificity of Linkage Studies Affects Estimation. PLoS ONE, 9, e103690.

[17]   Blakely, T. and Salmond, C. (2002) Probabilistic Record Linkage and a Method to Calculate the Positive Predictive Value. International Journal of Epidemiology, 31, 1246-1252.

[18]   Wang, N., Zeng, N.N. and Zhu, W. (2010) Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations. Northeast SAS User Group proceedings, Section of Health Care and Life Sciences, Baltimore, Maryland, 14-17 November 2010, 1-9.

[19]   Roos, L., Walld, R.B., Wajda, A., Bond, R. and Hartford, K. (1996) Record Linkage Strategies, Outpatient Procedures, and Administrative Data. Medical Care, 34, 570-582.

[20]   Muse, A.G., Mikl, J. and Smith, P.F. (1995) Evaluating the Quality of Anonymous Record Linkage Using Deterministic Procedures with the New York State Aids Registry and a Hospital Discharge File. Statistics in Medicine, 14, 499- 509.

[21]   Jamieson, E., Roberts, J. and Browne, G. (1995) The Feasibility and Accuracy of Anonymized Record Linkage to Estimate Shared Clientele among Three Health and Social Service Agencies. Methods of Information in Medicine, 34, 371-377.

[22]   Stevens, J., Katz, E.G. and Huxley, R.R. (2010) Associations between Gender, Age and Waist Circumference. European Journal of Clinical Nutrition, 64, 6-15.

[23]   Matsushita, Y., Takahashi, Y., Mizoue, T., Inoue, M., Noda, M. and Tsugane, S., for JPHC Study Group. (2008) Overweight and Obesity Trends among Japanese Adults: A 10-Year Follow-Up of the JPHC Study. International Journal of Obesity, 32, 1861-1867.

[24]   Williamson, D.F., Kahn, H.S., Remington, P.L. and Anda, R.F. (1990) The 10-Year Incidence of Overweight and Major Weight Gain in US Adults. Archives of Internal Medicine, 150, 665-672.