E-Book, Englisch, 200 Seiten
Elliot Statistical Confidentiality
1. Auflage 2011
ISBN: 978-1-4419-7802-8
Verlag: Springer-Verlag
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Principles and Practice
E-Book, Englisch, 200 Seiten
ISBN: 978-1-4419-7802-8
Verlag: Springer-Verlag
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;5
2;Contents;9
3;1 Why Statistical Confidentiality?;13
3.1;1.1 What Is Statistical Confidentiality?;14
3.2;1.2 Stakeholders in the Statistical Process;15
3.3;1.3 The Data Stewardship Organization's Dilemma;15
3.4;1.4 The Value of Statistical Data;18
3.5;1.5 Why Are DSOs Concerned About Statistical Confidentiality?;20
3.5.1;1.5.1 A Difficult Context for a DSO;20
3.5.1.1;1.5.1.1 Privacy Worries;21
3.5.1.2;1.5.1.2 Confidentiality Concerns;21
3.5.1.3;1.5.1.3 Changing Legal and Social Context;22
3.5.1.4;1.5.1.4 Sensitivity to Social Impact—''Group Harm'';22
3.5.2;1.5.2 Providing Data and Protecting Confidentiality;23
3.5.3;1.5.3 Consequences of a Confidentiality Breach;24
3.5.4;1.5.4 What Motivates a DSO to Provide Confidentiality?;25
3.5.4.1;1.5.4.1 Legal Requirements and Fair Information Practices;25
3.5.4.2;1.5.4.2 Pragmatic Considerations;28
3.5.4.3;1.5.4.3 Ethical Obligations;29
3.6;1.6 High-Quality Statistical Data Raise Confidentiality Concerns;30
3.6.1;1.6.1 Characteristics of High-Quality Statistical Data;30
3.6.2;1.6.2 Disclosure Risk Problems Stemming from Characteristics of High-Quality Statistical Data;33
3.7;1.7 Disclosure Risk and the Concept of the Data Snooper;34
3.8;1.8 Strategies of Statistical Disclosure Limitation;35
3.8.1;1.8.1 Restricted Access;35
3.8.2;1.8.2 Restricted Data;36
3.9;1.9 Summary;36
4;2 Concepts of Statistical Disclosure Limitation;39
4.1;2.1 Conceptual Models of Disclosure Risk;39
4.1.1;2.1.1 Elements of the Disclosure Risk Problem;41
4.1.1.1;2.1.1.1 Microdata;41
4.1.1.2;2.1.1.2 Deliberate Linkage;42
4.1.1.3;2.1.1.3 Aggregate Data;43
4.1.1.4;2.1.1.4 Attribution and Subtractive Attack;43
4.1.1.5;2.1.1.5 Linking Tables;45
4.1.1.6;2.1.1.6 Hierarchical Tables;46
4.1.1.7;2.1.1.7 Linking Anonymized Data Sets;47
4.1.1.8;2.1.1.8 Spontaneous Recognition;47
4.1.2;2.1.2 Perceived and Actual Risk;47
4.1.3;2.1.3 Scenarios of Disclosure;48
4.1.3.1;2.1.3.1 Motivation;48
4.1.3.2;2.1.3.2 Means;49
4.1.3.3;2.1.3.3 Opportunity;49
4.1.3.4;2.1.3.4 Types of Attacks;50
4.1.3.5;2.1.3.5 Key Variables;51
4.1.3.6;2.1.3.6 Target Variables;51
4.1.3.7;2.1.3.7 Effect of Data Divergence;51
4.1.3.8;2.1.3.8 Likelihood of Success;52
4.1.4;2.1.4 Data Environment Analysis;54
4.2;2.2 Assessing the Risk;54
4.2.1;2.2.1 Uniqueness;54
4.2.2;2.2.2 Matching/Reidentification Experiments;55
4.2.3;2.2.3 Disclosure Risk Assessment for Aggregate Data;55
4.3;2.3 Controlling the Risk;56
4.3.1;2.3.1 Metadata Level Controls;56
4.3.2;2.3.2 Distorting the Data;57
4.3.3;2.3.3 Controlling Access;57
4.4;2.4 Data Utility Impact;58
4.5;2.5 Summary;59
5;3 Assessment of Disclosure Risk;60
5.1;3.1 Thresholds and Other Proxies;61
5.2;3.2 Risk Assessment for Microdata: Types of Matching;62
5.2.1;3.2.1 File-Level Risk Metrics;62
5.2.1.1;3.2.1.1 Population Uniqueness;62
5.2.1.2;3.2.1.2 The Proportion of Sample Uniques that are Population Unique;63
5.2.1.3;3.2.1.3 The Skinner and Elliot Method;63
5.2.2;3.2.2 Record-Level Risk Metrics;65
5.2.2.1;3.2.2.1 Probability Modeling Approaches;65
5.2.2.2;3.2.2.2 Special Uniqueness;66
5.3;3.3 Record Linkage Studies;67
5.3.1;3.3.1 Using an External Data Set;68
5.3.2;3.3.2 Using the Pre-SDL Data Set;69
5.3.2.1;3.3.2.1 Distance-Based Record Linkage;69
5.3.2.2;3.3.2.2 Probabilistic Record Linkage;70
5.4;3.4 Risk Assessment for Count Data;71
5.5;3.5 What is at Risk?: Understanding Sensitivity;73
5.6;3.6 Summary;74
6;4 Protecting Tabular Data;76
6.1;4.1 Basic Concepts;78
6.1.1;4.1.1 Structure of a Tabular Array;78
6.1.2;4.1.2 Risky Cells;81
6.1.2.1;4.1.2.1 Dominance Rule or (n, k)-Rule;81
6.1.2.2;4.1.2.2 Prior/Posterior Ambiguity Rule;81
6.1.2.3;4.1.2.3 n-Rule;82
6.1.3;4.1.3 The Secondary Problem: The Data Snooper's Knowledge;82
6.1.3.1;4.1.3.1 A Priori Knowledge;82
6.1.3.2;4.1.3.2 The Output Pattern;83
6.1.4;4.1.4 Disclosure Limitation;86
6.1.5;4.1.5 Loss of Information;87
6.1.6;4.1.6 The DSO's Problem;87
6.1.7;4.1.7 Disclosure Auditing;88
6.2;4.2 Four Methods to Protect Tables;88
6.2.1;4.2.1 Cell Suppression;89
6.2.2;4.2.2 Interval Publication;92
6.2.3;4.2.3 Controlled Rounding;93
6.2.4;4.2.4 Cell Perturbation;96
6.2.5;4.2.5 All-in-One Method;97
6.3;4.3 Other Methods;97
6.3.1;4.3.1 Table Redesign;98
6.3.2;4.3.2 Introducing Noise to Microdata;98
6.3.3;4.3.3 Data Swapping;99
6.3.4;4.3.4 Cyclic Perturbation;99
6.3.5;4.3.5 Random Rounding;100
6.3.6;4.3.6 Controlled Tabular Adjustment;101
6.4;4.4 Summary;103
7;5 Providing and Protecting Microdata;104
7.1;5.1 Why Provide Access?;106
7.2;5.2 Confidentiality Concerns;110
7.3;5.3 Why Protect Microdata?;114
7.4;5.4 Restricted Data;116
7.4.1;5.4.1 In Order to Limit Disclosure, What Shall We Mask?;119
7.5;5.5 Matrix Masking;120
7.6;5.6 Masking Through Suppression;121
7.7;5.7 Local Suppression;123
7.8;5.8 Noise Addition;123
7.9;5.9 Data Swapping;125
7.9.1;5.9.1 Implementations of Data Swapping;126
7.9.1.1;5.9.1.1 An Example;126
7.9.2;5.9.2 A Protocol for Data Swapping;127
7.10;5.10 Masking Through Sampling;129
7.11;5.11 Masking Through Aggregation;130
7.11.1;5.11.1 Global Recoding;130
7.11.2;5.11.2 Topcoding;131
7.12;5.12 Microaggregation;131
7.13;5.13 Synthetic Microdata;131
7.14;5.14 Concluding Thoughts;133
8;6 Disclosure Risk and Data Utility;134
8.1;6.1 Basics of Disclosure Risk and Data Utility;134
8.1.1;6.1.1 Choosing the Parameter Values of an SDL Method;135
8.2;6.2 Data Utility Metrics;136
8.3;6.3 Direct Measurement of Utility;137
8.4;6.4 The R-U Confidentiality Map;138
8.4.1;6.4.1 Constructing an R-U Confidentiality Map: Multivariate Additive Noise;140
8.4.2;6.4.2 R-U Confidentiality Map for Topcoding;142
8.5;6.5 Discussion;145
9;7 Restrictions on Data Access;147
9.1;7.1 Who Can Have Access?;148
9.2;7.2 Where Can Access Be Obtained?;149
9.3;7.3 What Analysis Is Permitted?;150
9.4;7.4 Modes of Access;151
9.4.1;7.4.1 Free Access;151
9.4.2;7.4.2 Delivered Access;151
9.4.3;7.4.3 Safe Settings;152
9.4.4;7.4.4 Virtual Access;152
9.4.5;7.4.5 Licensing;153
9.5;7.5 Conclusion;155
10;8 Thoughts on the Future;157
10.1;8.1 New Meanings for Privacy and Statistical Confidentiality;159
10.2;8.2 Who Will Care About Statistical Data?;161
10.3;8.3 What New Forms of Data Stewardship Organizations Will Develop?;162
10.4;8.4 Will Statistical Data Remain Valuable?;164
10.5;8.5 New Data Types;165
10.5.1;8.5.1 Geospatial Data;165
10.5.2;8.5.2 Audio and Video Data;166
10.5.3;8.5.3 Biometric Recognition Data;166
10.5.4;8.5.4 Biological Material Data;167
10.5.5;8.5.5 Network Data;168
10.6;8.6 Privacy Preserving Data Mining;169
10.7;8.7 Other New Issues for Statistical Confidentiality;170
10.7.1;8.7.1 Technological Advances;170
10.7.2;8.7.2 Increased Expectations About Data Access;171
10.7.3;8.7.3 Sophisticated Privacy Advocates;172
10.7.4;8.7.4 New Confidentiality Legislation;172
10.7.5;8.7.5 Demand for Data from Researchers;172
10.7.6;8.7.6 Challenges in Communicating Confidentiality Protections;173
10.8;8.8 Will There Be New Forms of Data Snooping?;174
10.8.1;8.8.1 The Data Snooper of the Future;174
10.8.2;8.8.2 New Attack Modalities;175
10.9;8.9 What New Strategies of Disclosure Limitation Should Be Developed?;177
10.10;8.10 Finally, an Exciting Vision for Statistical Confidentiality;178
11;Glossary;180
12;References;189
13;Index;203




