Herzog / Scheuren / Winkler | Data Quality and Record Linkage Techniques | E-Book | sack.de
E-Book

E-Book, Englisch, 234 Seiten, eBook

Herzog / Scheuren / Winkler Data Quality and Record Linkage Techniques

E-Book, Englisch, 234 Seiten, eBook

ISBN: 978-0-387-69505-1
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book offers a practical understanding of issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models, focusing on the Fellegi-Holt edit-imputation model, the Little-Rubin multiple-imputation scheme, and the Fellegi-Sunter record linkage model. The second part presents case studies in which these techniques are applied in a variety of areas, including mortgage guarantee insurance, medical, biomedical, highway safety, and social insurance as well as the construction of list frames and administrative lists. This book offers a mixture of practical advice, mathematical rigor, management insight and philosophy.
Herzog / Scheuren / Winkler Data Quality and Record Linkage Techniques jetzt bestellen!

Zielgruppe


Professional/practitioner

Weitere Infos & Material


Data Quality: What It is, Why It is Important, and How to Achieve It.- What is Data Quality and Why Should We Care?.- Examples of Entities Using Data\break to their Advantage/Disadvantage.- Properties of Data Quality and Metrics for Measuring It.- Basic Data Quality Tools.- Specialized Tools for Database Improvement.- Mathematical Preliminaries for Specialized Data Quality Techniques.- Automatic Editing and Imputation of Sample Survey Data.- Record Linkage – Methodology.- Estimating the Parameters of the Fellegi–Sunter Record Linkage Model.- Standardization and Parsing.- Phonetic Coding Systems for Names.- Blocking.- String Comparator Metrics for Typographical Error.- Record Linkage Case Studies.- Duplicate FHA Single-Family Mortgage Records.- Record Linkage Case Studies in the Medical, Biomedical, and Highway Safety Areas.- Constructing List Frames and Administrative Lists.- Social Security and Related Topics.- Other Topics.- Confidentiality: Maximizing Access to Micro-data while Protecting Privacy.- Review of Record Linkage Software.- Summary Chapter.


7 Automatic Editing and Imputation of Sample Survey Data (p. 61)

7.1. Introduction

As discussed in Chapter 3, missing and contradictory data are endemic in computer databases. In Chapter 5, we described a number of basic data editing techniques that can be used to improve the quality of statistical data systems. By an edit we mean a set of values for a specified combination of data elements within a database that are jointly unacceptable (or, equivalently, jointly acceptable). Certainly, we can use edits of the types described in Chapter 5.

In this chapter, we discuss automated procedures for editing (i.e., cleaning up) and imputing (i.e., filling in) missing data in databases constructed from data obtained from respondents in sample surveys or censuses. To accomplish this task, we need efficient ways of developing statistical data edit/imputation systems that minimize development time, eliminate most errors in code development, and greatly reduce the need for human intervention.

In particular, we would like to drastically reduce, or eliminate entirely, the need for humans to change/correct data. The goal is to improve survey data so that they can be used for their intended analytic purposes.

One such important purpose is the publication of estimates of totals and subtotals that are free of self-contradictory information. We begin by discussing editing procedures, focusing on the model proposed by Fellegi and Holt [1976]. Their model was the first to provide fast, reproducible, table-driven methods that could be applied to general data. It was the first to assure that a record could be corrected in one pass through the data.

Prior to Fellegi and Holt, records were iteratively and slowly changed with no guarantee that any final set of changes would yield a record that satisfied all edits. We then describe a number of schemes for imputing missing data elements, emphasizing the work of Rubin [1987] and Little and Rubin [1987, 2002].

Two important advantages of the Little–Rubin approach are that (1) probability distributions are preserved by the use of defensible statistical models and (2) estimated variances include a component due to the imputation. In some situations, the Little–Rubin methods may need extra information about the non-response mechanism.

For instance, if certain high-income individuals have a stronger tendency to not report or misreport income, then a specific model for the income-reporting of these individuals may be needed. In other situations, the missing-data imputation can be done via methods that are straightforward extensions of hot-deck. We provide details of hot-deck and its extensions later in this chapter.

Ideally, we would like to have an all-purpose, unified edit/imputation model that incorporates the features of the Fellegi–Holt edit model and the Little– Rubin multiple imputation model. Unfortunately, we are not aware of such a model. However, Winkler [2003] provides a unified approach to edit and imputation when all of the data elements of interest can be considered to be discrete.


Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.